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AN  ITERATIVE  PROPER  ORTHOGONAL  DECOMPOSITION 
(I-POD)  TECHNIQUE  WITH  APPLICATION  TO  THE  FILTERING 
OF  PARTIAL  DIFFERENTIAL  EQUATIONS 

D.  YU,  +  AND  S.  CHAKRAVORTY  t 


Abstract.  In  this  paper,  we  consider  the  filtering  of  systems  governed  by  partial  differential 
equations  (PDE).  We  adopt  a  reduced  order  model  (ROM)  based  strategy  to  solve  the  problem. 
We  propose  an  iterative  version  of  the  snapshot  proper  orthogonal  decomposition  (POD)  technique, 
termed  I-POD,  to  sequentially  construct  a  single  ROM  for  PDEs  that  is  capable  of  capturing  their 
behavior  over  the  entire  state  space  of  the  system,  and  not  just  around  the  snapshot  trajectory. 
Further,  the  technique  is  entirely  data  based,  and  is  applicable  to  forced  as  well  as  unforced  systems. 
The  I-POD  is  compared  to  two  other  ROM  techniques:  the  Balanced  POD  (  BPOD)  and  the  dynamic 
mode  decomposition  (DMD).  We  apply  the  ROM  generated  using  the  I-POD  technique  to  construct 
reduced  order  Kalman  filters  to  solve  the  filtering  problem.  The  methodology  is  tested  on  several 
1-dimensional  PDEs  of  interest  including  the  heat  equation,  the  wave  equation  and  2-dimensional 
pollutant  transport  equation. 


Keywords:  Proper  Orthogonal  Decomposition  (POD),  Filtering/  Data  Assimi¬ 
lation,  Partial  Differential  Equations. 

1.  Introduction.  In  this  paper,  we  are  interested  in  the  filtering/  data  assimi¬ 
lation  in  systems  that  are  governed  by  partial  differential  equations  (PDE).  We  take 
a  reduced  order  model  (ROM)  based  approach  to  the  problem.  We  propose  an  itera¬ 
tive  version  of  the  snapshot  proper  orthogonal  decomposition  (POD)  technique  that 
allows  us  to  form  an  ROM  of  the  PDE  of  interest  in  terms  of  the  eigenfunctions  of 
the  PDE  operator.  The  I-POD  is  compared  to  two  other  ROM  techniques:  the  Bal¬ 
anced  POD  (BPOD)  and  the  dynamic  mode  decomposition  (DMD)  technique.  We 
then  apply  this  ROM,  along  with  the  Kalman  filtering  technique,  to  form  a  reduced 
order  filter  for  the  PDE.  The  filter  is  constructed  in  an  offline-online  fashion  where 
the  expensive  computations  for  the  ROM  construction  is  accomplished  offline,  while 
the  online  part  consists  of  the  reduced  order  Kalman  filter  which  is  much  more  com¬ 
putational  tractable  than  the  full  problem.  The  technique  is  applied  to  several  one 
and  two  dimensional  partial  differential  equations. 

We  take  a  ROM  based  approach  to  solving  the  problem  of  filtering  in  PDEs.  In 
particular,  we  rely  on  the  so-called  proper  orthogonal  decomposition  (POD),  more 
precisely,  the  snapshot  POD  technique,  to  construct  ROMs  for  the  PDE  of  interest. 
The  POD  technique  has  been  used  extensively  in  the  Fluids  community  to  produce 
ROMs  of  fluid  physics  phenomenon  such  as  turbulence  and  fluid  structure  interac¬ 
tion  [1-4].  There  has  also  been  work  recently  to  produce  so-called  balanced  POD 
models  to  better  approximate  outputs  of  interest  through  an  amalgam  of  the  snap¬ 
shot  POD  and  the  balanced  model  reduction  paradigm  of  control  theory  [5]  to  produce 
computationally  efficient  balanced  POD  models  of  the  physical  phenomenon  of  inter¬ 
est  [6,7].  More  recently,  there  has  been  work  on  obtaining  information  regarding  the 
eigenfunctions  of  a  system  based  on  the  snapshot  POD,  called  the  dynamic  mode 
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decomposition  (DMD)  [8,9].  However,  a  couple  of  issues  are  central  to  the  construc¬ 
tion  of  the  snapshot  POD  technique:  1)  at  what  times  do  we  take  snapshots  of  the 
process,  and  2)  the  snapshot  POD  essentially  provides  a  reduced  basis  approximation 
of  the  localized  behavior  of  a  system  around  the  snapshot  trajectory,  is  there  a  con¬ 
structive  way  to  infer  the  behavior  of  a  system  from  the  snapshot  POD  away  from 
this  trajectory?  We  propose  a  randomly  perturbed  iterative  version  of  the  snapshot 
POD  (I-POD)  which  sequentially  allows  us  to  sample  the  process  of  interest  at  vari¬ 
ous  different  time  scales,  and  with  different  initial  condition.  Further,  we  show  that 
this  process  allows  us  to  theoretically  reconstruct  all  the  eigenfunctions  of  the  original 
system  using  either  data  from  experiments,  or  from  numerical  simulations  (similar  to 
the  DMD  approach)  thereby  allowing  us  to  sequentially  build  a  single  ROM  for  the 
system,  utilizing  multiple  sample  trajectories  of  the  system,  and  by  sampling  these 
trajectories  at  different  timescales.  To  the  best  of  our  knowledge,  our  work  is  the 
first  that  proves  that,  under  certain  assumptions,  the  snapshot  POD  does  extract  the 
eigenfunctions  of  the  underlying  operator  (though  ample  empirical  evidence  is  pro¬ 
vided,  this  is  not  proved  in  the  DMD  paper  [8]).  The  I-POD  approach  is  sequential 
and  involves  solving  a  sequence  of  small  eigenvalue  problems  to  get  a  single  ROM 
of  a  large  scale  system.  This  is  in  contrast  to  techniques  such  as  the  balanced  POD 
and  Eigensystem  Realization  Algorithm  (ERA)  [10],  that  may  require  the  solution  of 
a  very  large  Hankel  singular  value  decomposition  problem  if  the  number  of  inputs/ 
outputs  is  large.  Hence,  the  I-POD  can  be  a  computationally  attractive  alternative  to 
the  BPOD  in  such  a  scenario.  Also,  we  show  in  the  paper  that  the  I-POD  compares 
favorably  to  the  BPOD  and  DMD  in  accuracy.  Further,  the  sequential  extraction 
of  eigenfunctions  from  different  time  scales  is  in  general  not  possible  with  the  DMD 
approach  since  it  does  not  perform  adjoint  simulations  as  done  by  the  I-POD.  More¬ 
over,  to  the  best  of  the  knowledge  of  the  authors,  except  for  some  recent  work  in  the 
Fluid  mechanics  community  [11],  the  use  of  POD  based  ROMs  for  filtering  PDEs  is 
relatively  rare. 


The  problem  of  estimating  dynamic  spatially  distributed  processes  is  typically 
solved  using  the  Ensemble  Kalman  Filter  (EnKF)  and  has  been  used  extensively  in 
the  Geophysics  literature  [12, 13]  and  more  recently,  in  Dynamic  Data  Driven  Appli¬ 
cation  Systems  (DDDAS)  and  traffic  flow  problems  [14-20].  The  EnKF  is  a  particle 
based  Kalman  filter  that  maintains  an  ensemble  of  possible  realizations  of  the  dynamic 
map.  The  Kalman  prediction  and  measurement  update  steps  are  performed  using  en¬ 
semble  operations  instead  of  the  traditional  matrix  operations.  A  primary  issue  with 
the  EnKF  is  the  choice  of  the  ensemble  realizations  and  their  number.  This  is  almost 
always  done  in  a  heuristic  fashion.  Also,  the  prediction  stage  requires  expensive  for¬ 
ward  simulations  of  the  realizations  using  a  solver  which  can  take  a  significant  amount 
of  time.  Thus,  real  time  operation  is  an  issue.  In  contrast,  all  the  expensive  compu¬ 
tations  for  our  ROM  based  technique,  namely  POD  basis  and  ROM  generation,  are 
done  offline  and  hence,  real  time  operation  is  never  an  issue  given  the  offline  computa¬ 
tions.  Thus,  we  may  think  of  our  approach  as  a  computationally  tractable  alternative 
to  the  EnKF  algorithm.  Historically,  there  has  been  a  lot  of  theoretical  research  in 
the  Control  Systems  community  on  the  estimation  and  control  of  systems  driven  by 
PDEs  [21-30].  In  fact,  it  is  well  known  that  for  linear  PDEs,  an  infinite  dimensional 
version  of  the  Kalman  filter  exists  which  involves  the  solution  of  an  operator  Ricatti 
equation  [31].  This  can  be  very  computationally  intensive  and  may  be  unsuitable 
for  online  implementation.  In  contrast,  the  major  computational  complexity  of  the 
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ROM  based  technique  that  we  propose  is  offline  and  the  online  computations  are  es¬ 
sentially  trivial  thereby  making  the  technique  very  suitable  for  online  implementation. 

The  paper  is  organized  as  follows.  In  section  2,  we  introduce  the  filtering  problem. 
In  section  3,  we  present  the  offline  I-POD  procedure  used  to  construct  the  ROM  for 
large  scale  linear  systems.  In  Section  4,  we  compare  the  I-POD  technique  to  the 
BPOD  and  DMD  techniques.  In  section  5,  we  outline  the  online  portion  of  the 
reduced  order  Kalman  filter  constructed  for  filtering  along  with  error  bounds  for  the 
resulting  approximations.  Further,  we  apply  the  offline-online  procedure  problem  to 
the  solution  of  1-dimensional  PDE  filtering  problem  for  the  heat  equation,  the  wave 
equation  and  a  2-dinrensional  pollutant  transport  equation.  We  also  compare  the 
performance  of  the  I-POD  based  filter  with  that  of  the  full  order  Kalman  filter. 

2.  Preliminaries.  In  this  work,  we  are  interested  in  the  filtering  of  distributed 
parameter  systems,  systems  whose  evolution  is  governed  by  a  partial  differential  equa¬ 
tion  (PDE),  given  sparse  measurements  of  the  spatial-temporal  field  variable.  Math¬ 
ematically,  we  are  interested  in  estimating  the  state  of  the  field  variable  X  £  H,  for 
some  suitable  Hilbert  space  H.  The  state  X  is  governed  by  the  operator  equation 

X  =  AX  +  W,  (2.1) 

where  W  is  a  spatially  distributed  Gaussian  white  noise  process  perturbing  the  motion 
of  the  system  [32] .  We  assume  that  the  boundary  conditions  for  the  PDE  are  known. 
We  do  not  have  access  to  measurement  of  the  entire  states,  instead  we  only  have  access 
to  measurements  of  the  field  at  some  sparse  set  of  spatial  locations  in  the  domain  of 
the  process  given  by 


Y(tk,Xj)=CX(tk)+vlj\  (2.2) 

where  X(tk)  represents  the  state  at  the  discrete  time  instant  tk ,  and  Y(tk,Xj)  rep¬ 
resents  a  localized  measurement  of  the  state  variable  at  the  sparse  set  of  locations 
given  by  Xj,  j  =  1,  •  •  •  m,  and  V’jf'1  is  a  discrete  time  white  noise  process  corrupting 
the  measurements  at  the  spatial  location  Xj.  We  assume  that  the  differential  operator 
A  is  self  adjoint  with  a  compact  resolvent,  and  thus,  A  has  a  discrete  spectrum  with 
a  full  set  of  eigenvectors  that  forms  an  orthonormal  basis  for  the  Hilbert  space  H. 
Further,  we  assume  that  the  operator  A  generates  a  stable  semigroup.  In  Section  IV, 
we  extend  the  results  to  non  self-adjoint  operators. 

Given  the  above  assumptions,  we  can  discretize  the  PDE  above  using  compu¬ 
tational  techniques  such  as  finite  elements  (FE)/  finite  difference  (FD)  to  obtain  a 
discretized  version  of  the  operator  equations  in  Euclidean  space  given  by  the 
following: 


xk  =  Axk-i  +  wk, 

Vk  —  CXk  T  Vki  (2.3) 

where  Xk  is  a  discretized  version  of  the  state  X  that  resides  in  the  high  dimensional 
Euclidean  space  3lN ,  while  Wk  and  Vk  are  discretized  versions  of  the  white  noise  pro¬ 
cess  corrupting  the  state  and  measurement  equations  respectively.  Given  that  the 
operator  A  is  self-adjoint,  the  discretized  operator  A  is  (usually)  a  symmetric  ma¬ 
trix  with  a  full  set  of  eigenvectors  and  real  eigenvalues  whose  eigenvectors  form  an 
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orthonormal  basis  of  3ffw  and  for  suitable  large  N,  are  arbitrarily  good  approxima¬ 
tions  of  the  true  eigenfunctions  of  the  original  operator  A.  We  relax  the  self-adjoint 
assumption  later  in  this  section.  We  shall  assume  throughout  this  paper  that  a  fine 
enough  discretization  is  given  to  us  and  thus,  the  behavior  of  the  original  system,  in 
terms  of  the  trajectories  of  the  system,  is  captured  sufficiently  well  by  the  discretized 
version  of  the  system.  Thus,  in  the  rest  of  the  paper,  we  only  consider  the  discretized 
version  of  the  problem.  Given  the  above  discretization,  a  naive  approach  to  the  solu¬ 
tion  of  the  filtering  problems  is  to  use  a  standard  Kalman  filter  to  solve  the  problem. 
The  standard  Kalman  filter  problem  is  given  by  the  following  [33] : 

Given  that  xq,  vk,  vuk  are  jointly  gaussian  and  mutually  independent;  Xo  is 
N(x o,-Po)>  Vk  is  zero  mean,  covariance  RkSki',  Wk  is  zero  mean,  covariance  Qk5ki- 

Kalman  Filter: 


Xk- t-l/fc  (A  -RkC^jXk/k—l  ^-kUkA o/  — l  Xq 

Kk  =  T.k/k-iC  {CT,k/k-\C  +  Rk)  1 

^fc+i/fc  =  A[Y,k/k_ 1  —  '£‘k/k-\C  {CY,k/k-\C  +  Rk)  1CT,k/k-i]A'  +  Qk 
Xk/k  “  Xk/k—1  T  ^k/k—  lW  T  Rk)  ( Uk  Cxk/k—l) 

Zk/k  =  Zk/k-1  -  Xklk-iC'iCXk/k-iC'  +  Rk)-lCVk/k  -  1  (2.4) 


Due  to  the  very  high  dimensionality  of  $tN ,  since  N  can  easily  run  into  millions  of 
degrees  of  freedom  (DOF)  for  a  general  finely  discretized  PDE,  the  Kalman  filtering 
equations  are  computationally  intractable  for  such  high  DOF  systems.  For  example, 
for  pollutant  transport  equation,  with  N  =  2500,  E*,  is  a  2500  *  2500  matrix,  and  we 
need  to  compute  the  inverse  of  this  matrix  in  the  full  order  Kalman  filter  at  every  time 
step.  If  we  use  a  ROM,  then  E*,  is  only  a  30*30  matrix,  thereby  reducing  the  Kalman 
filter  computations  by  several  orders  of  magnitude.  In  general,  the  computational 
complexity  of  the  Kalman  filter  is  0(N3)  where  N  is  the  order  of  the  system.  Thus, 
if  the  order  of  the  ROM  Nr  «  N,  the  order  of  the  full  order  system,  then  the 
computational  gains  are  highly  significant.  Thus,  we  need  to  first  suitably  reduce 
the  order  of  the  system  before  we  apply  Kalman  filtering  techniques  to  the  above 
problem. 

3.  An  Iterative  Approach  to  Proper  Orthogonal  Decomposition  (I- 

POD).  Consider  the  following  linear  system: 

Xk  =  Axk- i,  given  x(0),  (3.1) 

where  xk  €  ,  N  is  very  large  and  A  is  a  symmetric  matrix.  Recall  that  the  above 

high  dimensional  linear  system  results  from  the  discretization  of  a  self  adjoint  linear 
operator. 

A  1.  We  assume  that  there  is  a  unique  null  vector  corresponding  to  A  and  that 
the  matrix  A  is  Hurwitz,  i.e.,  the  system  is  stable. 

Suppose  that  we  choose  some  arbitrary  initial  condition  a:(0)  and  take  M  snap¬ 
shots  of  the  system’s  trajectory  at  the  time  instants  t\  <0  <  ■  ■  ■  <  tM,  where  these 
snapshots  need  not  be  equi-spaced.  Let  us  denote  the  data  matrix  of  the  stacked 
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snapshots  by  X,  i.e., 


X  \x\ ,  X2 )  *  '  '  5  AT ]  , 

where  Xi  =  x{ti).  Suppose  now  that  the  number  of  snapshots  is  much  smaller  than  the 
dimension  of  the  system,  i.e.,  M  «  N .  Then,  using  the  snapshot  POD  technique, 
we  can  get  the  POD  basis  T  of  the  trajectory  encoded  in  the  snapshot  ensemble  X 
as  follows: 

T  =  XVP (3.2) 

where  Vp  and  are  the  eigenvector-eigenvalue  pair  corresponding  to  the  correlation 
matrix  X'  X,  i.e., 

(X'X)VP  =  VpEp.  (3.3) 

Note  that  the  M  x  M  eigenvalue  problem  to  be  solved  for  the  POD  eigenfunctions  is 
much  easier  than  the  high  dimensional  N  x  N  eigenvalue  problem  that  needs  to  be 
solved  if  we  were  interested  in  solving  for  the  eigenvalues  and  eigenvectors  of  A.  Given 
the  snapshot  POD  eigenfunctions,  we  can  obtain  a  reduced  order  approximation  of 
the  system  in  Eq.  3.1  as  follows: 

*l>. k  =  (T'AT)x/jk- 1  =  Aipk-u  (3-4) 

where  ip  represents  the  projection  of  the  system  state  onto  the  POD  eigenfunctions 
and  A  represents  the  reduced  order  M  x  M  system  matrix. 

Consider  the  reduced  order  system  matrix  A.  We  know  that  A  is  symmetric 
and  thus,  has  a  full  eigenvalue  decomposition.  Let  (A,  P)  represent  the  eigenvalue- 
eigenvector  pair  for  A,  i.e., 

AP  =  PA.  (3.5) 

Noting  that  A  =  PAP',  the  ROM  matrix  A  transformed  to  the  orthonormal  co¬ 
ordinates  specified  by  P,  can  be  represented  in  the  modal  co-ordinates  <p  as: 

<pk  =  A<j>k-i-  (3.6) 

Thus  it  follows  that 

A  =  ( P'T')A{TP ),  (3.7) 

where  T  is  the  POD  eigenfunction  matrix  and  P  is  the  ROM  eigenfunction  matrix. 
Note  that  T  is  N  x  M  and  that  P  is  M  x  M,  and  hence,  TP  is  N  x  M.  The  above 
equation  looks  like  an  eigendecomposition  of  the  matrix  A  except  that  M  «  N  and 
thus,  this  is  not  necessary.  Note  that  the  transformation  TP  denotes  the  transfor¬ 
mation  from  the  original  state  space  to  the  POD  eigenfunction  space  to  the  ROM 
eigenfunction  space.  In  the  following,  we  relate  the  eigenvalues  and  eigenvectors  of  A 
to  the  diagonal  form  A  and  the  transformation  TP. 

A  2.  Assume  that  “p”  eigenvectors  of  the  matrix  A  are  active  in  the  snapshot 
ensemble  X,  i.e., 

p 

Xi  =  ^  aZjVj,i  =  1, 2, . M, 

i= i 
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where  p  <  M  and  without  loss  of  generality,  it  is  assumed  that  the  active  eigenvectors 
consist  of  the  first  “p”  eigenvectors.  This  assumption  essentially  implies  that  the 
number  of  modes  active  within  the  snapshots  is  less  than  the  number  of  snapshots  in 
the  ensemble. 

Under  assumption  A1  and  A2,  the  following  result  is  true. 

Proposition  3.1.  The  columns  of  the  transformation  T P  are  the  eigenvectors 
of  A  with  corresponding  eigenvalues  encoded  in  the  diagonal  matrix  A,  i.e., 

A(TP )  =  (TP)  A.  (3.8) 

Proof.  Recall  that  T  =  XVpT,p  1^2.  We  have 

X  =  Va  =  [vi,v2,-  ■  ■  vp] 

where  note  that  V  is  an  A  x  p  and  a  is  an  p  x  M  matrix.  First,  consider  the  case 
when  p  =  M.  Then,  it  follows  that 

A  =  T'AT  =  E-^Vp'X' AXVPT,-1/2  =  (E“1/2TUa/)  (V'AV)  (oTpE”1/2) . 

M*M  M*M  M*M 

Comparing  this  to  the  fact  that  A  =  PAP' ,  it  follows  that 

P  =  S”1/2UpV  =  P,  (3.9) 

if  PP'  =  I.  To  show  this,  note  that 

PP'  =  E-^UpVoOUpE-1/2.  (3.10) 

Further,  a! a  =  X'X  =  VpYipVpl  and  hence,  noting  the  orthogonality  of  the  columns 
of  Up,  it  follows  that: 

PP'  =  E;1/2Up,UpEpUp'UpE-1/2  =  I.  (3.11) 

Further,  it  follows  that 

TP  =  (XUpE;1/2)(E;1/2UpV)  =  U(aUpE;1/2)(E;1/2Up,a')  =  VP'P  =  U(3.12) 

i.e.,  the  columns  of  TP  are  indeed  eigenvectors  of  A.  Moreover,  it  also  follows  that 
owing  to  the  uniqueness  of  the  similarity  transformation  of  A  that  the  eigenvalues 
corresponding  to  the  eigenvectors  in  TP  are  in  the  diagonal  form  A.  Hence,  this 
proves  our  assertion  for  the  case  when  p  =  M. 

When  p  <  M ,  the  rank  of  the  snapshot  ensemble  X  is  p  <  M  and  hence,  the  rank 
of  X'X  is  p  <  M.  Thus,  it  follows  that  the  POD  eigenvalues  will  be  non-zero  for 
only  p  POD  eigenfunctions.  Therefore,  the  transformation  into  the  POD  basis  T  = 
XVpTip  should  only  include  the  POD  eigenvectors  corresponding  to  the  non-zero 
eigenvalues.  This  implies  that  X'X  =  a' a  =  VpT,pVp,  where  Ep  contains  the  non¬ 
zero  POD  eigenvalues,  and  Vp  contains  the  corresponding  eigenvectors.  Further,  T  = 
XYpTip  ,  and  hence,  P,  as  defined  in  Eq.  3.9,  is  nowpxp.  Thus,  the  analysis  above 


6 


goes  through  unchanged,  and  hence,  PP'  =  I,  and  TP  =  V,  where  V  now  consists  of 
the  p  active  eigenvectors.  □ 

At  this  point,  we  make  the  following  remark. 

Remark  1.  Suppose  that  p  >  M  ,  i.e.,  the  number  of  active  eigenvectors  are 
more  than  the  number  of  snapshots.  WLOG,  let  p  =  N.  Then 

A  =  T'AT  =  AXVPT,-112  =  (E -'l2Vy)V'  AV(aVpY,-1/2) 

=  (E;1/2FpV)^(aypE-1/2)=  ^  ^ 

M*N  "  "  M*MM*Mm*M 

where  (/?,  T)  represents  the  eigenvalue  decomposition  of  the  ROM  matrix  A.  Note 
that  now  owing  to  the  fact  that  N  >  M,  we  can  no  longer  use  the  uniqueness  of  the 
similarity  transformation  of  A  to  conclude  that  the  transformation  T /3  contains  the 
eigenvectors  of  A. 

The  above  proposition  and  the  remark  above  suggest  a  technique  through  which 
eigenvectors  of  the  system  matrix  A  can  be  extracted  up  to  any  time-scale.  First,  we 
make  the  following  assumption. 

A  3.  We  assume  that  there  are  K  characteristic  timescales  embedded  in  the 
matrix  A,  namely  Ti,---Tk-  Let  the  eigenvalues  corresponding  to  timescale  Tj  be 
{A^\  •  •  •  A^. }  and  let  the  corresponding  eigenvectors  be  here, 

Mj  is  the  number  of  active  eigenvectors  in  the  jth  timescale.  Further,  we  assume  that 

the  timescales  are  well-separated,  i.e.,  if  for  some  tk,  (A^)  7^  0,  then  (A^)  «  0  for 

all  i  <  j.  The  above  assumption  essentially  implies  all  the  eigenvectors  corresponding 
to  timescales  below  a  given  timescale  decay  well  before  the  eigenvectors  at  the  given 
timescale  decay. 

At  this  point,  we  also  need  to  make  sure  that  all  possible  eigenfunctions  corre¬ 
sponding  to  any  timescale  are  excited.  Under  the  assumption  A1  and  A2,  the  following 
result  assures  us  of  this: 

Proposition  3.2.  Let  the  initial  condition  x^\0)  to  the  linear  system  in  Eq.  3.1 
be  chosen  according  to  a  Gaussian  distribution  N(0,a2I).  Then,  every  eigenvector  of 
A  is  excited  almost  surely  in  at  least  one  of  the  trajectories  X^) . 

Proof.  Due  to  the  eigenvalue  decomposition  of  A ,  we  may  write: 

N 

Xk  =  y^^i)k(x(0),Vj)vi, 

i= 1 

where  (., .)  denotes  the  inner  product  in  .  The  above  implies  that  ( Xk,Vi )  = 
Xk(x(0),Vi),  and  hence 

E\(xk,Vi)\2  =  (Xi)2kE\(x(0),Vi)\2  =  (A  i)2kv'PoVi  =  cr2(A,)2fc,  (3.13) 

here,  Pq  =  U[a;(0)a:(0)T]  =  a2I.  Thus,  the  ith  component  of  the  system  trajectory, 
i.e.,  the  contribution  of  the  ith  eigenvector,  is  a  Gaussian  random  variable  with  zero- 
mean  and  a  variance  that  geometrically  decays  in  time  as  shown  above.  Thus,  the  ith 
mode  is  bound  to  be  active  for  at  least  one  among  the  ensemble  of  trajectories.  In 
fact,  owing  to  the  Gaussian  nature  of  the  component,  it  is  true  that  its  absolute  value 
will  be  above  any  given  threshold,  at  any  given  time,  with  a  finite  probability.  □ 

Given  the  results  above  and  assumption  A3,  we  are  in  a  position  to  outline  a 
procedure  that  allows  us  to  isolate  all  eigenfunctions  at  any  given  timescale. 
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Suppose  without  loss  of  generality  that  Tf  >  T2  ■  ■  ■  >  Tk ■  Suppose  now  that  we 
are  interested  in  isolating  all  the  eigenfunctions  corresponding  to  the  timescale  Ti. 
We  choose  an  initial  time  t^ 1  and  subsequent  snapshot  times  t„\  n  =  1  :  M,  such 
that  M  >  Mi  and  such  that  the  initial  time  >>  T2.  Thus,  the  snapshot  timing 
assures  us  that  all  the  eigenvectors  at  the  timescales  below  Ti  will  have  decayed  by 
the  snapshot  times  of  interest,  and  thus,  the  only  participating  modes  are  the  eigen¬ 
functions  corresponding  to  timescale  T\.  Then,  using  Propositions  1  and  2,  we  know 
that  we  can  isolate  all  the  eigenfunctions  at  the  timescale  Ti  given  enough  snapshot 
ensembles.  In  particular,  suppose  that  Xj  is  the  jth  snapshot  ensemble  at  timescale 
T\.  Due  to  proposition  2,  as  j  — >  oo,  we  know  that  every  eigenfunction  in  set  is 
bound  to  be  excited.  Further,  due  to  the  fact  that  M  >  M1;  it  follows  using  Propo¬ 
sition  1  that  the  eigenfunctions  of  the  ROM  are  the  same  as  the  eigenfunctions  of 
A.  Thus,  every  snapshot  ensemble  gives  us  some  of  the  eigenvectors  ti  £  k*1'  and  as 
j  — »  oo,  we  are  assured  that  all  possible  v  G  V ^  are  recovered. 

Given  that  we  have  recovered  all  the  eigenfunctions  V ^  corresponding  to  the 
longest  timescale  Ti,  we  can  now  iteratively  recover  all  the  eigenfunctions  at  all  the 
subsequent  timescales  as  follows.  Given  V^,  we  randomly  choose  an  initial  condition 
x(0)  and  form  the  snapshot  ensemble  X  at  snapshot  times  such  that 

number  of  snapshots  M  >  M2,  and  the  initial  time  of  the  snapshot  t0  »  T3,  i.e. , 
such  that  all  eigenfunctions  at  timescales  shorter  than  T2  are  absent  in  the  ensemble. 
Given  the  snapshot  ensemble  X  we  subtract  the  contributions  of  the  eigenfunctions 
in  V^\  i.e., 


Mi 


*(#>)  =  xitfi-GG1’)' 


(2) 


fc=l 


(3.14) 


Consider  the  modified  snapshot  ensemble  X,  it  follows  that  X,  by  construction,  only 
contains  eigenfunctions  from  the  set  V ^  and  thus,  following  the  randomly  perturbed 
POD  procedure  outline  previously,  we  can  recover  all  the  eigenfunctions  in  V ^ .  Given 
G«  and  G<2\  we  can  repeat  the  removal,  and  randomly  perturbed  POD  procedure, 
to  recursively  obtain  all  the  sets  V ^  up  to  any  desired  timescale  Tn.  The  above 
development  can  be  summarized  in  the  following  algorithm: 

The  development  above  and  the  I-POD  algorithm  can  be  summarized  in  the  fol¬ 
lowing  result. 

Proposition  3.3.  Under  assumptions  A1-A3,  the  I-POD  algorithm  extracts  all 
eigenfunctions  corresponding  to  any  given  time  scale  . 

Remark  2.  In  practice,  we  choose  the  timescales  as  following:  first,  we  choose  a 
large  enough  time  so  that  most  of  the  system  eigenfunctions  have  decayed,  then  take 
several  snapshots  and  extract  the  eigenfunctions  corresponding  to  this  timescale  T\. 
After  subtracting  these  eigenfunctions  from  the  trajectory,  we  can  still  use  the  same 
timescale  T\ ,  i.  e,  the  she  snapshots,  or  guess  a  start  time  of  T2  (T2  <  T\ ),  and  then 
use  the  same,  or  different  snapshots,  respectively,  to  extract  the  next  set  of  eigenfucn- 
tions.  In  practice,  we  use  the  same  snapshots  to  extract  eigenvectors  until  we  cannot 
extract  any  more  in  this  timescale,  before  we  move  to  the  next  timescale. 


Algorithm  1  Algorithm  I-POD 

1.  Given  timescales  T),  •  •  ■  Tj< 

2.  Set  i  =  1,  V =  4>  (empty  set) 

3.  WHILE  i  <  I< 

DO 

(a)  Choose  snapshot  times  ■  ■  t^,  such  that  t^  »  Ti+1  and  M  >  Mi 

(b)  Set  j  =  l 

i.  Choose  Xq'(  ,  the  initial  condition  of  the  jth  snapshot  ensemble  at 
time  scale  7),  from  7V(0,  cr2/)  and  generate  the  jth  snapshot  ensem¬ 
ble  xf 

ii.  Substract  all  the  slower  eigenfunctions  from  the  snapshot  ensemble 
using  Eq.  3.14,  and  the  previously  extracted  eigenfunctions  from 
the  sets  V<2),  ■  ■  ■ 

iii.  Isolate  the  eigenfunctions  at  timescale  7),  V^\  using  the  snapshot 
POD.  Set  j  =  j  +  1 

iv.  If  all  eigenfunctions  in  have  been  obtained,  go  to  step  (c),  else 
go  to  step  (i) 

(c)  Set  i  =  i  +  1 

4.  Output  the  eigenfunctions  in  sets  ■  ■  ■  lAA4 


Remark  3.  The  timescales  are  dependent  on  the  Physics  and  can  he 

inferred  from  physical  insight  or  simulations.  The  number  of  snapshots  that  are  re¬ 
quired  to  extract  the  eigenfunctions  have  to  he  “large  enough”.  Of  course,  it  might  not 
he  possible  to  know  a  priori  when  M  is  large  enough.  However,  some  amount  of  trial 
and  error  can  tell  us  as  to  what  is  a  suitable  number  for  M .  In  fact,  a  good  heuristic 
measure  is  to  wait  long  enough  before  the  first  snapshot,  such  that  most  modes  have 
decayed  and  we  have  lesser  number  of  modes  participating  than  the  number  of  snap¬ 
shots.  In  fact,  this  is  a  heuristic  that  is  often  used  in  the  POD  literature  [8].  This 
can  easily  be  construed  from  the  eigenvalue  decomposition  of  the  snapshot  ensemble 
by  checking  for  zero  eigenvalues  and  eigenvectors,  i.e.,  rank  deficiency  of  the  snapshot 
ensemble. 

We  also  note  that  though  theoretically  we  can  extract  eigenfunctions  at  all  time 
scales,  due  to  small  numerical  errors  that  accumulate,  practically,  we  may  only  be 
able  to  extract  the  eigenfunctions  corresponding  to  the  first  few  timescales.  In  most 
applications,  these  first  few  timescales  are  typically  enough  to  form  a  good  ROM  (for 
instance,  please  see  the  examples  in  Section  f  ). 

The  I-POD  technique  is  a  completely  data  based  technique  and  does  not 
need  knowledge  of  the  system  matrix  A.  Note  that  ultimately,  the  ROM  A  =  T' AT, 
contains  all  the  information  regarding  the  eigenfunctions  of  the  operator  A  under 
the  assumptions  above.  Again  note  that  T  =  XVpT,p  1^2,  and  thus,  it  follows  that 
A  =  T' AXVpT,p  =  T' XVpT,p  1^2 ,  where  X  is  the  one  time  step  advanced  version 
of  the  snapshot  ensemble  X  (in  the  discrete  time  case),  and  can  be  obtained  directly 
from  simulation  or  experimental  data. 

It  should  also  be  noted  that  Proposition  1  does  not  distinguish  between 
forced  systems  and  unforced  systems  since  Assumption  2  under  which  the  result 
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is  valid  only  asks  for  certain  sufficient  conditions  on  the  active  eigenfunctions  of  the 
system  in  the  snapshot  ensemble.  Since  the  forced  response  of  a  linear  system  is  also 
expressed  in  terms  of  the  eigenfunctions,  the  I-POD  procedure  is  valid  for  forced  sys¬ 
tems  as  well  as  long  as  Assumption  2  is  valid.  Hence,  the  procedure  can  be  used  on 
experimental  data,  where  the  system  response  may  be  forced.  Of  course,  the  issue  is 
that  Assumption  2  underlying  Proposition  1  may  not  be  satisfied  for  forced  systems. 
However,  in  our  experiments  we  do  see  that  this  assumption  is  indeed  satisfied  and 
that  we  can  actually  extract  the  eigenfunctions  of  the  forced  system  using  the  I-POD 
procedure  (this  can  also  be  seen  from  the  results  in  [8]). 


Representative  results  from  our  experiments  are  shown  in  Figure  3.1.  In  Figure 
3.1(a)  and  Figure  3.1(b),  we  compare  the  actual  eigenvalues  of  a  randomly  generated 
100  x  100  system  with  those  obtained  by  the  I-POD  procedure,  for  an  unforced  as  well 
as  a  forced  (constant  forcing)  symmetric  system.  The  results  show  that  the  I-POD 
eigenvalues  agree  very  well  with  the  actual  eigenvalues. 


(a)  Comparison  of  actual  eigenvalues  with  those 
obtained  using  I-POD  for  a  symmetric  unforced 
100  X  100  system 


(c)  Comparison  of  actual  eigenvalues  with  those 
obtained  using  I-POD  for  a  non-symmetric  un¬ 
forced  100  x  100  system 


(b)  Comparison  of  actual  eigenvalues  with  those 
obtained  using  I-POD  for  a  symmetric  forced 
100  X  100  system 


(d)  Comparison  of  actual  eigenvalues  with  those 
obtained  using  I-POD  for  a  non-symmetric 
forced  100  x  100  system 


Fig.  3.1.  Eigenvalues  extraction  results  using  I-POD 
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3.1.  Extension  to  Non  Self-adjoint  Operators.  In  the  following,  we  show 
how  the  I-POD  technique  can  be  extended  to  non  self-adjoint  operators.  Again,  we 
restrict  our  attention  to  a  suitably  fine  discretization  of  the  underlying  infinite  di¬ 
mensional  systems  given  by  a  large  scale  matrix  A.  Suppose  that  X  is  the  snapshot 
data  from  the  simulation  of  the  response  (forced/  unforced)  of  the  matrix  A.  Let  Vp 
and  Ep  be  the  eigenvectors  and  eigenvalues  resulting  form  the  eigendecomposition  of 
the  snapshot  data  X'X,  and  let  T  =  XVpT,p  7  be  the  corresponding  POD  trans¬ 
formation.  The  reduced  order  model  is  then  given  by  A  =  T' AT.  Assuming  that 
the  ROM  can  be  diagonalized,  we  write  the  similarity  transformation  for  the  ROM, 
A  =  PAP-1.  Given  that  assumption  2  holds  (the  number  of  snapshots  are  greater 
than  the  number  of  active  eigenvectors),  we  have  the  following  result. 

Proposition  3.4.  The  eigenvalues  of  the  ROM  A,  given  by  the  diagonal  matrix 
A,  are  also  eigenvalues  of  the  full  order  model  A,  and  the  corresponding  eigenvectors 
are  given  by  the  transformation  TP. 

Proof.  Let  X  =  Va,  as  before,  where  V  denotes  the  active  eigenvectors  of  A  in 
the  snapshots,  and  a  is  the  coefficient  matrix  of  the  eigenvectors  for  all  the  snapshots. 
Let  the  number  of  active  eigenvectors  be  equal  to  the  number  of  snapshots.  Then 

P'ty’v)  p 

A  =  E  ~1/2Vpa'V'  AVaVpT,-112  =  Y,p1/2Vpa'V'V  AaVpE~1/2  =  PAP-1,  (3.15) 

where  A  are  the  eigenvalues  of  A  corresponding  to  the  eigenvectors  V.  Thus,  if  we 
show  that  P  is  the  inverse  of  P'iV'V),  then  due  to  the  uniqueness  of  the  similarity 
transformation  of  A,  it  follows  that  P  =  P-1  and  A  =  A.  To  show  this,  note  that: 

P'(V'V)P  =  Yi~1/2Vpa'  {V'V)aVpT.~1^2 .  (3.16) 

Here  a'(V'V)a  =  X'X  =  VpY>pVpl  and  therefore,  using  the  orthogonality  of  the 
columns  of  V^,,  it  follows  that 

P'(V'V)P  =  X-^V^VpXpV^VpE-1'2  =  I.  (3.17) 

Hence,  P  and  P'IV'V)  are  inverses  of  each  other.  It  can  also  be  easily  shown  that 
TP  =  V .  Further,  the  case  when  the  number  of  snapshots  is  greater  than  the  number 
of  active  eigenvectors  can  be  treated  identically  to  the  symmetric  case  considered  in 
Proposition  1.  □ 

In  the  non-symmetric  case,  P~1T'  does  not  contain  the  left  eigenvectors  of  A  as 
was  the  case  for  symmetric  systems.  In  fact,  P-1T'  is  the  pseudo-inverse  of  TP  =  V, 
i.e. ,  P-1T'  =  (U'U)-1!^'.  Thus,  even  though  we  know  the  right  eigenvectors  through 
the  POD  transformation  TP,  we  do  not  have  knowledge  of  the  left  eigenvectors  from 
POD. 

In  order  to  gain  knowledge  of  the  left  eigenvectors,  we  need  to  generate  data  Y 
from  the  adjoint  simulations,  i.e.,  using  matrix  A' .  Using  this  adjoint  simulation  data 
Y,  the  left  eigenvectors  of  A,  which  are  the  same  as  the  right  eigenvectors  of  A'  up  to 
a  multiplicative  constant,  can  be  found  using  the  I-POD  procedure.  In  other  words, 
the  POD  is  used  to  get  the  right  eigenvectors  of  A'  using  adjoint  simulation  data  Y 
thereby  providing  us  knowledge  of  the  left  eigenvectors  of  A.  Further,  random  ini¬ 
tial  conditions  can  be  used  to  generate  the  eigenvalues,  as  well  as  the  left  and  right 
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eigenvectors,  using  the  simulation  data  from  A  and  its  adjoint  A' ,  in  a  sequential 
fashion.  Finally,  the  left/  right  eigenvectors  corresponding  to  the  different  time  scales 
may  also  be  extracted  sequentially,  identical  to  the  symmetric  case.  Note  that  in  the 
non-symmetric  case,  the  left  eigenvectors  are  necessary  to  extract  the  eigenfunctions 
at  the  different  time  scales  using  the  sequential  I-POD  procedure,  since  the  initial 
amplitudes  of  the  different  eigenmodes  are  given  by  the  projection  of  the  initial  con¬ 
ditions  on  the  corresponding  left  eigenmodes  (note  that  the  DMD  cannot  do  the  same 
since  it  does  not  utilize  the  adjoint  information). 

In  Figure  3.1(c)  and  Figure  3.1(d),  we  show  the  comparison  of  the  eigenvalues  ob¬ 
tained  using  I-POD  with  the  true  eigenvalues  for  an  unforced  non-symmetric  system 
as  well  as  a  forced  non-symmetric  system.  In  fact,  we  can  find  as  many  eigenvalues 
as  we  want,  here  we  only  show  the  first  several  dominant  eigenvalues. 


4.  I-POD  using  input/  output  description.  In  the  following,  we  show  how 
to  extract  the  left  and  right  eigenvectors  of  the  system  using  POD,  and  the  input/ 
output  description  of  the  system,  as  opposed  to  the  random  initial  conditions  that 
have  been  used  so  far.  We  shall  also  compare  and  contrast  the  I-POD  with  two  closely 
related  approaches,  the  Balanced  POD  (BPOD),  and  the  Dynamic  Mode  Decompo¬ 
sition  (DMD)  approach. 

For  simplicity,  we  shall  consider  a  discrete  time  system  (due  to  the  technical 
problems  of  dealing  with  impulse  responses  in  continuous  time  systems) ,  and  also,  for 
ease  of  comparison  to  the  related  BPOD  and  DMD  approaches: 


xk  =  Axk-i  +  Buk- 1, 

yk  =  Cxk ■  (4.1) 

Let  the  input  influence  matrix  be  denoted  by  B  =  [bi ,  •  •  •  b.p]  and  the  output  matrix 
by  C  =  [ci,  • 

In  this  case,  the  columns  of  the  input  influence  matrix,  bj,  serve  as  the  initial 
conditions  for  the  simulation  of  the  system  A,  leading  to  the  snapshot  ensemble  Xj 
while  the  transposed  rows  of  the  output  matrix,  c/,  serve  as  the  initial  conditions  for 
the  simulations  of  the  adjoint  system  A',  leading  to  the  adjoint  snapshot  ensemble 
Y,j.  The  first  set  of  simulations  in  terms  of  the  snapshot  ensembles  Xj  leads  to  a  set 
of  right  eigenvalue-eigenvectors  (Ar,Vr),  while  the  adjoint  simulations  (Yj)  lead  to  a 
set  of  left  eigenvalues  and  eigenvectors  denoted  by  (A i,Vi).  In  particular,  in  order 
to  obtain  the  pair  (Ar,  Vr),  we  do  the  snapshot  POD  procedure,  followed  by  diag- 
onalization  of  the  ROM,  on  every  trajectory  Xj  sequentially.  Thus,  corresponding 
to  every  snapshot  ensemble  Xj,  we  will  obtain  the  corresponding  set  of  eigenvalues/ 
eigenvectors  (A/,  Vf).  For  every  subsequent  snapshot  ensemble  Xj,  we  only  add  the 
eigenvalues/  eigenvectors  that  are  not  already  extracted  from  Xk,  k  =  1, ...,  j  —  1.  A 
similar  procedure  can  be  used  to  extract  the  left  eigenvalue/  eigenvector  pair  (A/,  V)). 
Of  these  eigenvalue-eigenvectors  pairs,  we  only  keep  those  left/  right  eigenvectors  that 
correspond  to  the  eigenvalues  in  the  intersection  of  the  set  of  eigenvalues  in  A;,  and 
in  Ar.  Thus,  this  allows  us  to  keep  the  modes  of  the  system  that  are  both  observable 
and  controllable. 
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Suppose  we  discard  the  POD  modes  corresponding  to  the  small  singular  values 
of  the  POD,  then,  quantitatively,  we  retain  only  the  most  observable  and  most  con¬ 
trollable  modes  of  the  system  (A,  B ,  C)  in  V)  and  Vr  respectively.  If  we  only  consider 
those  eigenvectors  that  are  in  the  intersection  of  Ar  and  A /  as  mentioned  above,  then 
we  include  the  inodes  of  the  system  that  are  most  observable  as  well  as  controllable,  in 
the  ROM.  However,  it  is  not  necessary  that  this  method  result  in  a  balanced  decom¬ 
position.  Thus,  we  can  see  that  the  I-POD  can  sequentially  extract  the  eigenmodes 
present  in  a  single  trajectory  as  well  as  iteratively  extract  the  eigenmodes  from  dif¬ 
ferent  simulation  trajectories. 

Another  way  of  extracting  the  controllable/  observable  eigenmodes  corresponding 
to  the  input/  output  description  is  to  perform  an  SVD  on  the  cross  correlation  between 
the  primal  and  the  adjoint  simulations,  i.e. ,  an  SVD  of  Y-Xj  for  all  i,j.  In  particular, 
we  write: 


YlXj  =  up  Epv;, 

i  =  (T,p1/2UpYil)A(XjVp'Ep1/2)  =  PAt,P~\ 

A«  =  (P-'Y-^U’pYl)  A  (XjVpZ-^P) .  (4.2) 

$'  .  >1 Ui 

ij 

In  the  above,  it  can  be  shown  that  A contains  the  most  observable  and  controllable 
eigenmodes  present  in  the  responses  Yj  and  Xj.  However,  it  is  not  necessarily  true 
that  and  contain  the  most  observable  and  controllable  left  and  right  eigen¬ 
vectors  respectively.  However,  the  left  and  right  eigenvectors  corresponding  to  the 
eigenvalues  in  A ^  can  be  found  by  doing  the  eigenvalue  decomposition  of  Y/l/  and 
XjXj  respectively.  In  the  experiments  that  follow,  however,  we  use  and  xI'tj  as 
the  left/  right  eigenvectors. 

4.1.  Comparison  with  Balanced  POD.  In  the  following,  we  assume  that  the 
system  of  interest  has  to  outputs  and  p  inputs,  and  we  have  N  snapshots  for  each 
input/  output  trajectory.  In  this  case,  the  BPOD  has  to  solve  the  SVD  problem  of 
a  matrix  of  dimension  ( mN )  x  ( pN ).  The  I-POD  on  the  other  hand  solves  (to  +  p) 
eigenvalue  problems  of  dimension  NxN.  In  case  we  use  cross-correlations,  we  would 
have  to  solve  ( mp )  SVD  problems  of  dimension  NxN.  Hence,  it  can  be  seen  that  the 
I-POD  would  be  computationally  much  more  efficient  than  the  BPOD  if  the  number 
of  inputs/  outputs  of  a  system  is  large.  This  is  because  the  BPOD  requires  to  solve  an 
SVD  problem  for  a  matrix  formed  by  stacking  all  input/  output  correlation  matrices 
Y(Xj  in  a  large  Hankel  matrix,  whereas  the  I-POD  uses  the  eigendecomposition  of 
Y-Yj.  and  X'^Xj  (  or  Y-Xj  )  sequentially,  and  can  iteratively  build  up  the  ROM  due  to 
the  invariance  of  the  underlying  eigenmodes  to  the  different  trajectories  of  the  system. 

In  the  following,  two  examples  comparing  the  Balanced  POD  and  I-POD  are 
shown.  In  the  examples,  we  define  the  output  error  as: 

E0utput  —  ||I/rue  d/ed||i  (4.3) 

where  Ytrue  are  the  outputs  of  the  true  system  and  Yreci  are  the  outputs  of  the  reduced 
order  system. 

The  state  error  is  defined  as: 


Eft —  I  Xi/rl,  f.  X, 


ed 


(4.4) 
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where  XtrUe  are  the  states  of  the  true  system  and  Xred  are  the  states  of  the  reduced 
order  system. 


4.1.1.  Heat  Equation.  The  heat  transfer  by  conduction  along  a  slab  is  given 
by  the  partial  differential  equation: 


dT  d2T  , 
~di~ad^  +  f 


(4.5) 


The  length  of  the  slab  is  5 m  and  the  continuous  spatial  domain  X  is  divided  into 
100  grid  cells  of  equal  length.  The  model  is  simulated  for  a  period  of  100  minutes 
and  the  time  horizon  is  discretized  into  50000  time  steps,  a  is  thermal  diffusivity,  and 
takes  the  value  0.01.  /  is  the  constant  source  in  the  middle  of  the  slab,  and  f  takes 
the  value  50  °C /minute. 

The  boundary  condition  is:  T(0,t)  =  T(L,t)  =  0,Vf  >  0.  The  initial  condition  is 
200  °C  along  the  slab.  The  temperature  of  the  slab  is  measured  at  five  equi-spaced 
points  along  its  length. 

For  Balanced  POD,  100  snapshots  from  the  first  3000  time  steps  are  collected,  the 
snapshots  are  equi-spaced.  We  use  20  modes  to  construct  the  Balanced  POD  based 
ROM.  For  I-POD,  we  use  the  same  initial  conditions  (  input/  output  description  of 
the  system  )  as  Balanced  POD.  We  also  collect  the  snapshots  from  the  first  3000  time 
steps,  but  in  6  timescales.  Using  the  I-POD  technique,  we  can  collect  more  modes  se¬ 
quentially,  and  we  use  30  modes  to  construct  the  I-POD  based  ROM.  The  comparison 
between  the  Balanced  POD  and  I-POD  is  shown  in  Figure  4.1.  In  Figure  4.1(a)-(b), 
we  compare  the  eigenvalues  extracted  by  I-POD  and  Balanced  POD  (i.e. ,  eigenvalues 
of  the  ROM  constructed  by  BPOD)  . 

To  test  the  ROM,  we  use  20  different  random  initial  conditions  and  take  the  av¬ 
erage  output/state  error  over  these  20  simulation.  In  Figure  4.1(c),  we  compare  the 
output  errors  of  the  two  algorithm.  In  Figure  4.1(d),  we  compare  the  state  errors  of 
the  two  algorithm. 


4.1.2.  Pollutant  Transport  Equation.  The  two-dimensional  pollutant  trans¬ 
port  equation  describe  the  contaminant  transport  is: 


dc(x,y,t) 

dt 


d2c(x,t)  d2c(x,t)  dc(x,t)  dc(x,t) 
dx2  y  dy 2  Vx  dx  Vy  dy 


+  Ss, 


(4.6) 


where  c  is  concentration  of  the  contaminant,  D  is  dispersion  and  takes  value  0.6  here, 
v  is  velocity  in  the  x  and  y  directions  and  takes  value  1,  and  Ss  is  source  of  pollutant. 


In  simulation,  there  are  three  obstacles  and  three  sources  in  the  field.  The  initial 
condition  for  simulation  is  zero.  We  use  Neumann  boundary  conditions.  Ten  mea¬ 
surements  are  taken  equi-spaced  along  the  diagonal  of  the  field.  The  actual  field  at 
the  end  of  simulation  is  shown  in  Figure  4.2.  The  held  is  discretized  into  a  50*50  grid. 
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Eigenvalues  extract  by  I-POD 


Eigenvalues  extract  by  BPOD 
Actual  eigenvalues 
*  Eigenvalues  extract  by  BPOD 


(a)  Comparison  of  eigenvalues  extract  by  I-POD  (b)  Comparison  of  eigenvalues  extract  by  BPOD 
and  actual  eigenvalues  and  actual  eigenvalues 
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Fig.  4.1.  Comparison  between  BPOD  and  I-POD  for  Heat  equation 


Fig.  4.2.  Actual  field  at  the  end  of  simulation  for  pollutant  transport  equation 


For  Balanced  POD,  we  collect  the  100  snapshots  from  the  first  3000  time  steps. 
The  snapshots  are  equi-spaced,  and  we  can  construct  the  BPOD  based  ROM  using  22 
modes.  Note  that  we  can  solve  the  singular  value  decomposition  problem  using  ten 
measurements  and  three  inputs  for  this  problem,  but  we  are  not  able  to  solve  the  large 
singular  value  decomposition  problems  as  the  measurements  increase,  for  instance,  if 
we  measure  the  entire  field.  For  I-POD,  we  use  random  initial  condition,  and  collect 
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the  100  snapshots  from  the  first  3000  time  steps.  The  snapshots  are  equi-spaced,  and 
we  construct  the  I-POD  based  ROM  (using  Y'X)  using  30  modes. 


Eigenvalues  extract  by  I-POD  Eigenvalues  extract  by  BPOD 


(a)  Comparison  of  eigenvalues  extract  by  I-POD  (b)  Comparison  of  eigenvalues  extract  by  BPOD 
and  actual  eigenvalues 
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(d)  Comparison  of  state  errors 


Fig.  4.3.  Comparison  between  BPOD  and  I-POD  for  Pollutant  Transport  Equation 


The  comparisons  between  the  Balanced  POD  and  I-POD  is  shown  in  Figure  4.3. 
In  Figure  4.3(a)-(b),  we  compare  the  eigenvalues  extract  by  I-POD  and  Balanced 
POD.  To  test  the  ROM,  we  use  different  random  forcing  and  take  the  average  out¬ 
put/state  error  over  20  different  simulations.  In  Figure  4.3(c),  we  compare  the  aver¬ 
aged  output  errors  of  the  two  algorithm.  In  Figure  4.3(d),  we  compare  the  averaged 
state  errors  of  the  two  algorithm. 

From  the  two  examples,  we  can  see  that  at  the  beginning  of  the  simulation,  the 
output  error  using  I-POD  is  larger  than  Balanced  POD,  but  the  errors  decay  fast  and 
I-POD  is  more  accurate  then  Balanced  POD,  as  time  increases.  The  state  error  is 
lower  for  the  I-POD  when  compared  to  the  BPOD. 


4.2.  Comparison  with  DMD.  The  I-POD  can  be  construed  as  a  sequence  of 
DMDs  to  extract  the  eigenmodes  of  a  system  from  a  given  trajectory  targeting  the 
different  timescales.  Further,  the  I-POD  can  be  thought  of  as  a  sequence  of  DMDs 
that  iteratively  extracts  the  eigenmodes  present  in  different  trajectories  of  a  system. 
The  I-POD  also  utilizes  adjoint  simulations  to  obtain  information  regarding  the  left 
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eigenvectors  of  a  system,  and  hence,  can  construct  a  more  accurate  ROM  of  the  system 
in  modal  space  than  the  DMD.  To  see  this,  note  that  the  ROM  in  modal  space  is 
given  by: 


ipk  =  V{AVripk- 1  +  V/Bwk- 1, 

Vk  =  CVriPk.  (4.7) 

The  I-POD  can  accurately  extract  the  left  eigenvectors  Vj  from  the  adjoint  simula¬ 
tions  while  the  DMD  has  to  rely  on  a  pseudo-inverse  of  Vr  as  an  approximation  of  the 
left  eigenvectors  V).  Thus,  theoretically,  the  I-POD  approximation  is  guaranteed  to 
be  better  than  the  DMD  because  it  exploits  the  adjoint  information.  The  DMD  also 
does  not  have  the  time  scale  targeting  property  of  the  I-POD  given  a  single  trajectory 
of  a  system.  Finally,  the  I-POD  also  theoretically  shows  why  a  DMD  type  procedure 
is  capable  of  extracting  the  underlying  eigenmodes  of  the  system  (under  Assumption 
A2). 

Here,  we  compare  the  two  algorithms  for  the  heat  equation  and  pollutant  transport 
equation. 


4.2.1.  Heat  equation.  For  DMD,  we  use  a  random  initial  condition  (if  use  the 
actual  initial  condition,  it  will  give  a  worse  results),  and  collect  the  100  snapshots 
from  the  first  3000  time  steps.  The  snapshots  are  equi-spaced,  and  we  construct 
DMD  based  ROM  using  16  modes.  We  construct  the  I-POD  based  ROM  as  men¬ 
tioned  before.  The  comparisons  between  the  DMD  and  I-POD  are  shown  in  Figure 
4.4.  In  Figure  4.4(a)-(b),  we  compare  the  eigenvalues  extract  by  I-POD  and  DMD. 
To  test  the  ROM,  we  use  20  different  random  initial  conditions  and  take  the  average 
output/state  error  of  these  20  simulation.  In  Figure  4.4(c),  we  compare  the  averaged 
output  errors  of  the  two  algorithm.  In  Figure  4.4(d),  we  compare  the  averaged  state 
errors  of  the  two  algorithm. 


4.2.2.  Pollutant  Transport  Equation.  For  DMD,  we  use  the  random  initial 
condition,  and  collect  the  100  snapshots  from  the  first  3000  time  steps.  The  snapshots 
are  equi-spaced,  and  we  can  construct  the  DMD  based  ROM  using  30  modes.  We 
construct  the  I-POD  based  ROM  as  mentioned  before.  The  comparison  between  the 
DMD  and  I-POD  is  shown  in  Figure  4.5.  In  Figure  4.5(a)-(b),  we  compare  the  eigen¬ 
values  extract  by  I-POD  and  DMD.  To  test  the  ROM,  we  use  20  different  random 
time  forcing  and  take  the  average  output/state  error  over  20  simulations.  In  Figure 
4.5(c),  we  compare  the  average  output  errors  of  the  two  algorithm.  In  Figure  4.5(d), 
we  compare  the  average  state  errors  of  the  two  algorithms. 

From  the  two  examples,  we  can  see  that  I-POD  is  significantly  better  than  DMD, 
especially  for  the  2D  pollutant  transport  equation.  However,  this  should  not  be  a 
surprise  since  the  I-POD  uses  different  time-scales,  as  well  as  the  adjoint  information 
to  get  a  higher  fidelity  ROM. 


5.  Application  of  I-POD  to  Filtering  of  Partial  Differential  Equations. 

Consider  now  the  continuous-discrete  filtering  of  the  distributed  parameter  system  in 
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(d)  Comparision  of  state  errors 


Fig.  4.4.  Comparison  between  DMD  and  I-POD  for  Heat  equation 


the  high  dimensional  discretized  setting: 

xk  =  Axk-i+Wk,  (5.1) 

Vk  =  Cxk  +  Vk,  (5.2) 

where  recall  that  yk  £  3?p  is  the  measurement  at  time  tk,  Wk  is  white  noise  process 
perturbing  the  systems  while  Vk  is  a  measurement  white  noise  process  corrupting  the 
measurement  at  time  tk-  Here,  Wk  and  Vk  are  independent  and  the  initial  condition 
x(0)  is  independent  of  Wk  and  Vk  too.  Typically,  the  measurements  are  very  sparse, 
i.e.,  p  «  N,  and  N  is  very  large.  Hence,  using  standard  estimation  theoretic  tech¬ 
niques  such  as  the  Kalman  filter  for  filtering  the  above  system  is  out  of  question  owing 
to  the  high  dimensionality  of  the  resulting  problem  (the  Kalman  filter  requires  0(N3) 
operations  at  every  update  step.  Thus,  it  is  vitally  important  that  suitable  ROMs  be 
devised  to  alleviate  the  computational  intractability  of  the  problem  above.  Since  we 
are  considering  the  discrete  setting  for  filtering,  let  us  assume  that  the  measurements 
are  taken  time  T  apart. 

In  order  to  form  a  suitable  ROM  of  the  above  system,  suppose  we  keep  only  Nr  of 
the  eigenfunctions  of  A  as  modes  of  the  ROM.  Under  assumption  A1-A2,  the  expected 
value  of  the  error  between  the  true  system  and  the  ROM  at  any  time  is  given  by  the 
following  result. 
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Fig.  4.5.  Comparison  between  DMD  and  I-POD  for  Pollutant  Transport  Equation 


Proposition  5.1.  The  expected  value  of  the  squared  error  in  keeping  only  Nr 
modes  in  the  ROM  is  given  by 

N  N  (\.\2k  _  i 

E\\ek\\2=  Y,  (Xi)2kE\(x(0),Ui)\2  +  Y  _  1  ), 

i—Nr+l  i=Nr  +  l  ^  *' 

a2  =  u'iRwUi ,  (5.3) 

where  ek  =  x(tk)  —  xred(tk),  x(tk)  is  the  state  of  the  true  system,  xred{tk)  is  the  state 
of  the  reduced  order  system,  denotes  the  inner  product  in  Rn,  Ui  are  the  left 

eigenvectors,  and  Rw  represents  the  covariance  of  the  white  noise  process  wk . 

Note  that  the  first  term  in  the  above  expression  is  due  to  the  initial  conditions 
while  the  second  term  is  due  to  the  random  perturbation  wk- 

Proof.  The  error  incurred  in  keeping  only  Nr  modes  in  the  ROM  is  given  by: 

N  N 

efc  =  Y  (^i)k{x(0),Ui)vi  +  Y  &i(tk)Vi, 
i=Nr+ 1  i=Nr-\-l 
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where 


a  r(tk)  =  jr^y^crir), 

r— 0 

c7(t)  =  {wT,  Ui). 

Here,  iq  are  the  left  eigenvectors  while  v j  are  the  right  eigenvectors.  Then,  it  follows 
that: 


E\\ek\\2  =  ^i)2kmm,ui)\2  +  Y/E\A7(tk)\2,  (5.4) 

i  i 

where  for  notational  ease  the  subscript  i  is  used  to  denote  the  summation  from  Nr  + 1 
to  N.  Then, 


tk  tk 


|Ar(4)|2  =  E  E  \tk-s)c7(r)c7(s). 


T— 0 S— 0 


(5.5) 


Noting  that 


c7(t)c7(s)  =  (wT,Ui)(ws,Ui)  =  E  Wj(T)wk(s)uijUik, 

j,k 


where  Uij  denotes  the  jth  component  of  Ui ,  it  follows  that 

E[c7 {t)c7 {s)\  =  ^ ~^UijUikE[wj{r)wk{s)\  =  u^R^It  -  s}ui,  (5.6) 

j,k 

where  <5 [.]  denotes  the  Kronecker  delta  function.  Then,  substituting  the  above  equa¬ 
tion  back  into  Eq.  5.5,  and  using  the  result  in  Eq.  5.4,  while  using  the  sampling 
property  of  the  Kronecker  delta  function  under  an  integral,  the  result  follows.  □ 
Thus,  we  have: 

N 

£||efc||2<  E  (Ai)2fc£|(s(0),ui)|2  + 

i=Nr+l 

K - S' - / 

initial  condition  error  random  perturbation  error 

of  =  u'iRyjUi  (5.7) 

The  above  expression  gives  an  estimate  of  the  error  made  in  keeping  only  Nr  modes 
in  the  solution.  Note  that  the  measurement  equations  are  immaterial  in  these  error 
estimates  since  they  do  not  alter  the  system  equations  in  any  fashion.  In  fact,  the 
above  is  an  a  priori  estimate  that  is  averaged  over  all  possible  future  observations. 

Given  the  measurement  time  interval  T,  and  the  probability  density  function  of 
the  initial  state  Xq ,  we  can  neglect  those  modes  such  that  AfT  «  0  and  thus,  the 
perturbation  due  to  the  initial  condition  is  negligible.  Of  course,  the  error  due  to  the 
stochastic  perturbations  remains,  however,  theoretically,  we  can  get  all  A i  since  that 
is  assured  us  by  Proposition  3  and  I-POD,  and  hence,  we  can  make  an  a  priori  error 


N 

E  - 

i=Nr  + 1 


f( 


(A,)2fc  -  1 
(A,:)2  ^  1 
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estimate  regarding  the  error  made  in  keeping  only  Nr  modes  of  the  system. 

Given  that  we  have  sufficient  number  of  modes  in  our  ROM  such  that  the  error 
in  the  reduced  solution  is  within  some  pre-specified  bounds,  the  filtering  of  the  PDE 
proceeds  as  follows.  We  choose  those  eigenfunctions  that  need  to  be  kept,  given  that 
they  have  already  been  extracted  using  the  I-POD  procedure,  and  form  the  ROM 
for  the  filtering  problem  as  follows.  Define  the  transform  Vr  =  [v-| ,  •  •  •  Vjgr]  consisting 
of  the  retained  right  eigenmodes  and  V\  =  [ui,  •  •  •  uatJ,  the  retained  left  eigenmodes. 
The  filtering  ROM  is  the  following: 

Vtfc  =  ( V{AVr Wfc— 1  +  V/Bwk,  V>(0)  =  (*(0),  Vi), 

Uk  =  ( CVr)ipk  +  Vk,  (5.8) 


The  above  system  now  results  in  an  Nr  x  Nr  filtering  problem  with  Nr  <<  N  and 
thus,  standard  estimation  theoretic  methods  such  as  the  Kalman  filter  can  be  used  to 
solve  the  problem.  In  the  following  sections,  we  show  the  application  of  the  I-POD 
based  filtering  technique  to  the  1-D  Heat  equation,  the  1-D  wave  equation  and  the 
2-D  pollutant  transport  equation. 

5.1.  Heat  Equation.  The  heat  equation  is  given  is  Section  4,  here,  we  use  the 
same  initial  condition  and  boundary  condition  as  in  Section  4. 


First,  we  compare  the  right  eigenvectors  extracted  by  the  I-POD  technique  with 
the  full  order  system  in  Figure  5.1. 


rst  Right  Eigenvector  (ROM) 


(a) 


Third  Right  Eigenvector  (ROM) 


Right  Eigenvector  (ROM) 


(c) 


(d) 


Fig.  5.1.  Comparison  of  right  eigenvectors  extracted  by  I-POD  (  on  the  top  )  and  full  order 
system  (  on  the  bottom  ) 


The  comparisons  between  the  reduced  order  filter  and  real  system  at  four  different 
time  steps  are  shown  in  Figure  5.2(a).  The  red  curves  are  the  real  heat  profiles  and 
the  blue  curves  are  filter  results  from  the  reduced  model.  In  Figure  5.2(b)-(d),  the 
error  and  the  3tr  boundary  for  the  reduced  model,  at  three  different  chosen  location 
are  shown  too. 
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>r  and  3a  at  0.25L 


>r  and  3a  at  0.5L 


>r  and  3a  at  0.75L 


r  i 

(a)  Comparison  of  (b)  Error  and  3<r  error  (c)  Error  and  3cr  error  (d)  Error  and  3cr  error 
ROM  filter  (in  blue)  bounds  at  0.25L  bounds  at  0.5L  bounds  at  0.75L 

with  actual  heat 
profile  (in  red) 


Fig.  5.2.  Filter  results  for  heat  equation 


It  can  be  seen  that  the  I-POD  ROM  based  Kalman  filter  provides  a  good  estimate 
of  the  temperature  profile  for  the  problem  in  that  the  errors  stay  within  the  3  er  error 
bounds  (a  common  engineering  measure  for  verifying  the  consistency  of  a  filter). 

5.2.  Wave  Equation.  The  one  dimensional  wave  equation  describing  a  vibrat¬ 
ing  string  on  a  finite  interval  is: 


d2u(x,t)  2d2u{x,t) 
dt2  C  dx 2 


The  string  vibration  is  simulated  for  a  period  of  1.5  seconds  and  the  time  domain 
T  is  discretized  into  200  time  steps.  The  length  of  the  string  is  lm  and  is  divided  by 
M  =  100  grid  points.  The  variable  c  is  the  wave  speed,  and  takes  value  1  here. 

The  boundary  conditions  for  a  fixed  end  string  are: 

ux= o  =  0;  ux=M  =  0.  (5.10) 


Displacements  are  measured  at  five  equi-spaced  points  along  the  string.  A  random 
initial  condition  is  used  for  generating  the  reduced  order  model,  while  the  initial 
condition  of  the  string  for  simulation  is: 


duo 

dt 


uq  =  20sm(27r 


AJL 

if  x  < 

M 

2 

4  -  4— 

if  x  > 

M 

2  > 

(5.11) 


In  Figure  5.3(a),  we  show  the  comparison  between  the  real  wave  profiles  and  the 
reduced  model  filter  at  three  different  time  steps.  The  real  wave  profiles  are  in  red 
while  the  filter  results  are  shown  in  blue.  Also,  in  Figure  5.3  (b)-(d),  the  errors  and 
the  3<t  boundary  for  the  reduced  model  at  three  different  chosen  location  are  shown. 

Again,  these  results  show  that  the  ROM  based  filter  is  capable  of  getting  a  good 
estimate  of  the  wave  profile  based  on  the  noisy  measurements. 
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(a)  Comparison  of  (b)  Error  and  3 a  error  (c)  Error  and  3cr  error  (d)  Error  and  3<r  error 
ROM  filter  (in  blue)  bounds  at  0.3L  bounds  at  0.5L  bounds  at  0.7L 

with  actual  wave  profile 
(in  red) 


Fig.  5.3.  Filter  results  for  wave  equation 


(a)  1st  Right  eigen-  (b)  2nd  Right  eigen-  (c)  3rd  Right  eigen-  (d)  4th  Right  eigen¬ 
function  (ROM)  function(ROM)  function(ROM)  function(ROM) 


(e)  1st  Right  eigen-  (f)  2nd  Right  eigen-  (g)  3rd  Right  eigen-  (h)  4th  Right  eigen¬ 
function  function  function  function 

Fig.  5.4.  Comparison  of  Right  eigenfunctions 


5.3.  Pollutant  transport  equation.  The  2D  pollutant  transport  equation  is 
given  in  Section  4,  the  initial  condition  and  boundary  condition  are  the  same  as  in 
Section  4.  The  first  four  right  eigenfunctions  extracted  by  I-POD  technique  (on  the 
top)  and  the  full  order  system  (on  the  bottom)  are  compared  in  Figure  5.4. 

Figure  5.5  (a)  and  (b)  compare  the  filter  results  with  the  actual  field  at  the  end 
of  the  simulation.  Also,  in  Figure  5.5  (c)-(d),  the  errors  and  the  3er  boundary  for  the 
reduced  model  at  two  different  chosen  location  are  shown. 


These  results  show  that  for  the  2-dimensional  pollutant  transport  case,  the  ROM 
based  filter  is  capable  of  getting  a  good  estimate  of  the  actual  fluid  field  based  on  the 
noisy  measurements. 


5.4.  Comparison  with  Full  Order  Kalman  filter.  Now,  we  compare  the 
accuracy  and  the  computational  cost  of  the  I-POD  ROM  based  Kalman  filter  with 
the  full  order  Kalman  Filter.  For  the  heat  equation,  we  need  15.62s  using  ROM  fil¬ 
tering,  and  18.45s  using  the  full  order  Kalman  filter.  The  error  and  3cr  boundary 
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(a)  ROM  Filter  result  (b)  Actual  field  at  the  (c)  Error  and  3cr  error  (d)  Error  and  3cr  error 
at  the  end  of  Simula-  end  of  simulation  bounds  at  (45,45)  bounds  at  (10,10) 

tion 


Fig.  5.5.  Filter  results  for  Pollutant  transport  equation 


(a)  Comparison  of  KF  (b)  Comparison  of  KF  (c)  Comparison  of  KF  (d)  Comparison  of  KF 
for  heat  equation  for  heat  equation  for  wave  equation  for  wave  equation 


Fig.  5.6.  Comparison  for  ROM  Kalman  filter  and  Full  order  Kalman  filter 


comparing  ROM  and  full  order  Kalman  filter  at  two  different  locations  are  shown  in 
Figure  5.6  (a)  and  (b).  For  wave  equation  ,  we  need  2s  using  ROM  filtering,  and 
5.4s  using  the  full  order  Kalman  filter.  The  error  and  3er  boundary  comparing  ROM 
and  full  order  filter  at  two  different  locations  are  shown  in  Figure  5.6  (c)  and  (d). 
For  pollutant  transport  equation,  we  need  765.5s  using  ROM  filtering,  and  we  cannot 
implement  the  full  order  Kalman  filter  in  real  time  due  to  the  high  dimensionality 
of  the  problem.  As  can  be  seen  from  these  examples,  the  error  and  the  confidence 
bounds  of  the  estimates  are  better  in  the  case  of  the  full  order  Kalman  filter,  however, 
at  a  much  higher  computational  overhead.  It  should  also  be  noted  that  though  the 
errors  and  confidence  bounds  of  the  ROM  filter  are  higher,  the  ROM  filter  is  nonethe¬ 
less  consistent,  i.e. ,  the  ROM  filter  errors  remain  within  the  ROM  confidence  bounds. 


6.  Conclusion.  In  this  paper,  we  have  introduced  a  data-based  iterative  snap¬ 
shot  POD  (I-POD)  approach  to  form  ROMs  for  large  scale  linear  systems.  We  have 
used  the  I-POD  based  ROM  to  form  a  reduced  order  Kalman  filter  for  application 
to  the  filtering  of  linear  partial  differential  equations.  We  have  compared  the  perfor¬ 
mance  of  the  ROMs  formed  by  the  I-POD  to  those  formed  using  BPOD  and  DMD, 
and  shown  that  the  I-POD  compares  favorably  with  these  techniques.  We  have  shown 
the  application  of  the  I-POD  ROM  based  filtering  technique  to  the  heat,  wave  and 
pollutant  transport  equations.  As  can  be  seen  from  the  results,  the  linear  ROM  based 
filtering  performed  well  when  compared  to  the  full  order  Kalman  filter,  while  taking 
significantly  less  time.  The  most  pressing  need  at  this  point  is  to  be  able  to  extend 
the  I-POD  technique  to  nonlinear  systems.  Further,  we  will  also  apply  the  tech¬ 
nique  to  more  realistic  2  and  3-dinrensional  partial  differential  equations  that  may  be 
encountered  in  practice  such  as  fluid  flow  problems. 
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Information  Space  Receding  Horizon  Control 


Z.Sunberg,  S.  Chakravorty,  R.  Scott  Erwin 


Abstract — In  this  paper,  we  present  a  receding  horizon  solution 
to  the  optimal  sensor  scheduling  problem.  The  optimal  sensor 
scheduling  problem  can  be  posed  as  a  Partially  Observed  Markov 
Decision  Process  (POMDP)  whose  solution  is  given  by  an  Infor¬ 
mation  Space  (I-space)  Dynamic  Programming  (DP)  problem.  We 
present  a  simulation  based  stochastic  optimization  technique  that, 
combined  with  a  receding  horizon  approach,  obviates  the  need  to 
solve  the  computationally  intractable  I-space  DP  problem.  The 
technique  is  tested  on  a  sensor  scheduling  problems  in  which  a 
sensor  must  choose  among  the  measurements  of  N  dynamical 
systems  in  a  manner  that  maximizes  information  regarding  the 
aggregate  system  over  an  infinite  horizon.  While  simple,  such 
problems  nonetheless  lead  to  very  high  dimensional  dynamic 
programming  problems  to  which  the  receding  horizon  approach 
is  well  suited. 


I.  Introduction 

In  this  paper,  we  consider  the  problem  of  optimal  sensor 
scheduling  such  that  the  information  gained  by  the  sensor 
is  maximized.  In  particular,  this  problem  is  motivated  by 
the  so-called  Space  Situational  Awareness  (SSA)  problem 
where  a  large  number  of  space-based  targets,  such  as  space 
debris,  satellites  and  asteroids,  have  to  be  monitored  using 
far  fewer  sensors,  while  maximizing  the  information  we  have 
about  these  objects  [1],  It  is  easily  shown  that  the  scheduling 
problem,  in  general,  may  be  posed  as  a  Partially  Observed 
Markov  Decision  Problem  (POMDP)  whose  solution  is  given 
by  an  information  space  (I-space)  Dynamic  Programming 
(DP)  problem.  We  propose  a  receding  horizon  control  (I-space 
RHC:  IS-RHC)  approach  to  solve  such  I-space  DP  problems. 
The  online  stochastic  optimization  problems  that  result  from 
the  receding  horizon  approach  are  solved  using  a  simulation 
based  gradient  ascent  technique.  The  IS-RHC  technique  is 
tested  on  a  simple  scheduling  problem  where  the  sensor  has 
a  choice  between  measurements  of  N  dynamical  systems. 

In  recent  years,  the  optimal  sensor  scheduling  problem 
has  garnered  much  interest  in  the  Control  and  Robotics 
community  and  is  variously  known  as  Information-theoretic 
Control/  Active  Sensing  and  Dual  Control  [2],  [3].  Discrete 
dynamic  scenarios  such  as  target  tracking  [3]  have  been 
considered,  but  relatively  very  little  has  been  done  on  the 
optimal  sensing  of  nonlinear  dynamical  phenomenon.  In  the 
linear  dynamical  scenario,  the  optimal  scheduling  problem 
results  in  a  deterministic  optimal  control  problem  which 
can  be  solved  online  using  Model  Predictive  Control  [4], 
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In  the  nonlinear  case,  the  problem  is  stochastic  and  thus, 
is  significantly  harder  to  solve  because  of  the  associated 
computationally  intractable  stochastic  DP  problem.  In  this 
paper,  we  suggest  a  receding  horizon  control  approach  to 
the  solution  of  such  stochastic  sequential  decision  making 
problems,  in  particular,  I-space  sequential  decision  making 
problems,  that  bypasses  the  need  to  solve  the  stochastic  DP 
problem. 

It  is  very  well  known  that  stochastic  control  problems  with 
sensing  uncertainty,  of  which  sensor  scheduling  problems 
are  a  special  case,  can  be  posed  as  a  Markov  Decision 
Problem  (MDP)  on  the  Information  state  (I-state),  which  is 
usually  the  conditional  filtered  pdf  of  the  state  of  the  system 
[5]— [7] .  Unfortunately,  it  is  also  equally  well  known  that  such 
problems  are  notoriously  difficult  to  solve  owing  to  the  twin 
curses  of  dimensionality  and  history,  so  much  so  that  such 
problems  have  only  been  solved  for  small  to  moderate  sized 
discrete  state  space  problems  (i.e.,  wherein  the  underlying 
state  space  of  the  problem  is  discrete).  Initially,  exact  solution 
of  the  POMDPs  were  sought  [7]  utilizing  the  convexity  of 
the  cost-to-go  function  in  terms  of  the  I-state.  However, 
these  techniques  do  not  scale  well.  Thus,  focus  shifted  to 
solving  such  I-space  problems  using  randomized  point  based 
value  iteration  in  which  a  set  of  random  I-states  are  sampled 
in  the  I-space  and  an  approximate  MDP  defined  on  these 
randomly  sampled  states  is  then  exactly  solved  using  standard 
DP  techniques  such  as  value/  policy  iteration  [8] — [10]. 
These  methods  have  resulted  in  the  solution  of  much  higher 
dimensional  problems  when  compared  to  the  ones  that  can  be 
solved  using  exact  techniques,  however,  these  methods  still 
do  not  scale  to  continuous  state  and  control  space  problems. 

Model  Predictive  or  Receding  Horizon  Control  (MPC/ 
RHC)  has  been  one  of  the  most  successful  applications  of 
control  theoretic  techniques  in  the  industry  [4],  The  MPC 
techniques  solve  a  sequence  of  finite  horizon  open  loop 
control  problems  in  a  receding-  horizon  fashion  instead  of 
solving  the  infinite  dimensional  DP  equation  offline.  We 
propose  a  similar  approach  to  solve  I-space  sequential  decision 
making  problems,  wherein  a  sequence  of  open  loop  stochastic 
optimization  problems  are  solved  online  in  a  receding  horizon 
fashion.  However,  in  the  stochastic  case,  the  answers  of 
the  RHC  and  the  DP  techniques  do  not  coincide  because 
in  the  DP  formulation,  the  optimization  is  over  feedback 
policies  and  not  open  loop  control  sequences.  However,  such 
DP  problems,  in  particular,  I-space  problems,  are  virtually 
computationally  intractable  in  continuous  state  spaces  and 
thus,  the  IS-RHC  technique  provides  a  computationally 
attractive  solution  to  the  I-space  problems.  We  note  here  that 
Monte  Carlo  based  methods  for  solving  MDPs  [11],  [12]  are 
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another  computationally  attractive  alternative  to  solving  such 
high  dimensional  problems,  however,  a  detailed  comparison 
of  this  technique  with  the  MC  techniques  is  beyond  the  scope 
of  this  paper,  and  is  left  to  future  work.  At  the  same  time,  the 
empirical  results  reported  in  this  paper  show  that  the  IS-RHC 
technique  does  lead  to  better  payoffs  in  terms  of  information 
gains  when  compared  to  a  shortsighted  policy. 

The  original  contributions  of  this  paper  are  as  follows: 
1)  we  propose  an  online  receding  horizon  approach  to  the 
solution  of  the  POMDP  problem  that  results  from  the  sensor 
scheduling  problem,  and  2)  we  propose  a  simulation  based 
gradient  ascent  approach  to  the  solution  of  the  stochastic 
optimization  problems  resulting  from  the  receding  horizon 
approach  at  every  time  step.  Our  technique  is  valid  for  any 
nonlinear  autonomous  system  with  a  discrete  control  space. 

The  rest  of  the  paper  is  organized  as  follows.  In  Section  II, 
we  formulate  the  sensor  scheduling  problem.  In  Section  III, 
we  present  the  IS-RHC  technique.  In  Section  IV,  we  present 
the  convergence  analysis  of  the  technique  and  in  Section  V, 
we  present  a  simple  numerical  example  as  application  of  the 
IS-RHC  technique. 

II.  Motivation  and  Problem  Formulation 

In  this  section,  we  introduce  the  sensor  scheduling  prob¬ 
lem  that  we  wish  to  solve  in  this  paper.  Consider  a  dy¬ 
namical  system  with  state  denoted  by  X  where  X  = 
[X^-\  X^2\  ...X(N^]7  and  is  a  vector  that  represents  the 
state  of  a  dynamical  subsystem  whose  dynamics  may  (or  may 
not)  be  coupled  with  the  dynamics  of  the  other  dynamical 
subsystems.  In  the  case  of  the  SSA  problem,  the  state  of 
satellite  i  is  denoted  by  X<1> .  Let  the  dynamics  of  the  entire 
system  (the  entire  collection  of  subsystems)  be  represented  by 
the  following  nonlinear  difference  equation: 

Xk  =  F{Xk-i)  +  GpTk-OWfc-i,  (1) 

where  F(.)  and  G(.)  are  nonlinear  functions,  and  { Wk } 
is  an  uncorrelated  white  noise  sequence.  If  the  sub  dynamical 
systems  were  decoupled  the  above  equation  would  decompose 
into  N  independent  difference  equations,  one  for  each  of  the 
sub-states  The  measurement  equation  for  the  state  of 

the  system  is  denoted  by  the  following  (possibly)  nonlinear 
equation: 

zk  =  HUk(Xk)  +  14,  (2) 

where  {14}  is  a  zero  mean  uncorrelated  white  noise 
sequence,  and  HUk  (.)  is  a  nonlinear  measurement  function 
which  is  determined  by  uk.  uk  is  a  control  variable  that 
can  take  values  from  1  to  N\  uk  =  i  denotes  that  we 
make  a  measurement  of  sub-state  X^l)  at  time  k.  Although 
this  description  implies  that  we  can  only  measure  one 
sub-component  of  X  at  any  time  step,  we  might  have 
the  choice  of  making  P  >  1  measurements  as  well.  For 
notational  simplicity  we  shall  concentrate  only  on  P  =  1 


in  the  following  discussion.  The  generalization  to  P  >  1  is 
quite  straightforward. 

Let  Xk  denote  the  pdf  of  the  state  X  at  time  k.  We  shall 
call  Xk  the  information  state  of  the  system  since  it  encodes 
our  knowledge  (or  lack  thereof)  about  the  system  state  X. 
Given  the  information  state  (I-state)  at  time  k  —  1,  Xk- 1,  the 
I-state  at  time  /,:,  Xk,  will  depend  on  the  particular  component 
that  is  chosen  for  measurement  at  time  k  and  hence  on  the 
control  variable  uk.  It  is  also  clear  that  Xk  's  dependent  on 
the  noisy  observation  at  time  k,  zk.  However,  zk  is  a  random 
variable,  and  thus  Xk  is  also  random  given  Xk-\  and  the 
control  uk.  Given  Xk-i >  the  control  uk- i  and  the  current 
observation  zk,  the  current  I-state  Xk  is  obtained  via  the 
Bayes  filtering  equation  (Ch.  5,  pp.  252,  [5]).  In  fact,  the 
evolution  of  the  I-state  is  governed  by  a  Markov  chain  (MC) 
whose  transition  density  function  is  denoted  by  p(x' /Xi u) 

[5],  [6], 

Optimal  Sensor  Scheduling  Problem:  Our  objective  in  this 
work  is  to  maximize  the  total  information  about  the  dynamical 
system  over  the  infinite  horizon.  To  this  end,  let  us  define  the 
information  gain  metric  A I(x,  u)  denoting  the  expected  infor¬ 
mation  gain  in  choosing  control  u  at  I-state  X-  An  excellent 
discussion  of  metrics  for  sensor  tasking  problems  can  be  found 
in  [13].  Let  0  <  /?  <  1  denote  a  discount  factor  that  quantifies 
the  fact  that  the  information  gains  in  the  immediate  future 
are  more  important  to  us  than  the  information  gains  further 
out  in  the  future.  We  wish  to  solve  the  following  discounted 
sequential  decision  making  problem  over  all  possible  feedback 
policies  u(.): 

V*(x)  =  max  V(x,u(.)),  where  (3) 
«(■) 

OO 

V(x,u(-))  =  AI(Xt,u(xt)) frlxo  =  x\,  (4) 

t=i 

for  all  possible  information  states  x-  Since  the  I-state  x  is 
governed  by  a  controlled  Markov  chain,  the  answer  to  the 
above  problem  is  provided  by  solving  the  following  Dynamic 
Programming  problem: 

V*(x)  =  max[AJ(x,u)  J  p(x'\x,u)V*(x')dx']-  (5) 

Clearly  the  above  DP  problem  resides  in  a  continuous  and 
very  high  dimensional  state  space  thereby  making  the  problem 
computationally  intractable. 

III.  Information  Space  Receding  Horizon  Control 
(IS-RHC) 

In  this  section,  we  shall  propose  a  simulation  based  Reced¬ 
ing  Horizon  Control  approach  (IS-RHC)  to  solve  the  I-space 
MDP  problem  that  was  posed  in  the  previous  section. 

A.  Stochastic  Relaxation  of  Optimization  Problem 

Consider  again  the  statement  of  the  I-space  MDP  given  in 
Eq.  3.  Given  that  the  expected  information  gain  is  uniformly 
bounded  above,  i.e.,  \AI(x,u)\  <  M  <  oo  for  all  (%,  u)  and 
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given  the  discount  factor  S  <  1 ,  and  given  any  arbitrarily 
small  error  tolerance  e  >  0,  there  always  exists  a  finite  time  T 
such  that  the  finite  horizon  T-step  discounted  information  gain 
for  any  infinite  horizon  control  sequence 
{ut}tL\  is  arbitrarily  close  to  the  infinite  horizon  discounted 
cost-to-go  for  the  same  control  sequence.  Thus,  in  the  fol¬ 
lowing  we  shall  concentrate  on  solving  the  discounted  finite 
horizon  I-space  MDP  assuming  that  a  finite  horizon  T  and 
discount  factor  /3  is  given  such  that  the  above  approximation 
holds  thereby  leaving  us  with  a  finite  dimensional  optimiza¬ 
tion  problem  as  opposed  to  the  infinite  dimensional  problem 
resulting  from  the  original  infinite  horizon  case.  Define  the  T- 
step  information  gain  in  following  the  T-step  control  sequence 
U  =  {ui,  ■  ■  ■  ,  ut}  from  I-state  x  as  follows: 

T 

J{Xi  uu  ‘  ‘ '  ,ut)  =Ex\^2AI(xt,ut)/3t\xo=x\,  (6) 

t= l 

where  the  notation  Ex[.]  denotes  that  the  expectation  is  over 
the  sample  paths  {xo  =  > Xt}  that  are  particular 

T— step  realizations  of  the  I-space  process.  We  have  dropped 
the  subscript  T  for  notational  convenience.  The  above  equation 
is  different  from  Eq.  4  because  the  expectation  above  is  with 
respect  to  an  open  loop  policy  while  the  one  in  Eq.  4  is  with 
respect  to  a  feedback  policy.  Further,  we  define  the  optimal 
T-step  information  gain  as  follows: 

J*(x)=  max  J(x,Ui,---uT). 

Define  a  randomized  policy  II  =  {  tt-j  ,  •  •  ■  ttj- }  where  7rt  is 
a  probability  vector  such  that  t rtj  =  Pr(iq  =  j)  where  7r tj 
denotes  the  jth  component  of  7 rt.  Thus,  in  the  randomized 
policy,  we  do  not  take  a  particular  control  action  at  time  t, 
instead  we  take  the  control  action  ut  =  j,  j  =  1  •  •  •  N,  with  a 
probability  nt  j  and  Yhj  nt,j  =  1  for  all  t.  Further,  define  the 
T-step  information  gain  in  following  stochastic  policy  II  from 
I-state  x  as  follows: 

T 

Js(x,n)  =  A-f(xt,«t)^tlxo  =  x]. 

t- 1 

where  the  notation  Ex>u[.]  denotes  the  fact  that  the  expectation 
in  the  above  equation  is  now  with  respect  to  both  the  sample 
paths  {xi,  •  •  •  Xt}  and  control  sequences  {ui,  ■  ■  ■  up}-  Then, 
it  can  be  seen  that  the  following  relationship  holds: 

J«(X>n)  =  £  ,Ut  1  (7) 

the  summation  above  is  a  T-dimensional  sum  where  each  ut 
can  take  one  of  N  values.  In  the  following,  for  notational 
convenience,  we  shall  abuse  notation  and  denote  the  expected 
information  gain  due  to  a  stochastic  policy  Js{-)  by  </(.)  (the 
symbol  for  information  gain  due  to  a  deterministic  policy).  We 
wish  to  solve  the  optimization  problem: 

J*(x)  =  max  J(x,n),  (8) 

given  some  I-state  X- 


B.  Simulation  based  Stochastic  Gradient  Method 


Let  us  write  ttt,N  =  1  —  7rtji  —  ...  —  7rt;jv-i-  Then,  Eq.  7 
reduces  to  the  following  equation: 

N 

<7(x5n)=  [^2j{X,Ul,-,Ut=j,:,UT)TTtj] 

U\...Ut  j  —  1 
N—  1 

=  ^  ^  ^  ^  J (X?  *^1?  •••?  ut  ~  j->  UT )^t,j 

ui...uT  j= 1 

+J(x,u  1,  ...,Ut  =  N,  ...,ur)(  1  -  7Tt,  1  -  ...  -  7Tt)JV_l)].  (9) 


Note  that  J (x,  II)  is  a  multi-linear  function  of  the  probabilities 
ntj.  Then,  from  Eq.  9,  it  follows  that: 


dJ(x, n) 

dirtj 


y  ]  nt,Ul 

U\..Ut 


■■TtT,uT 


{J(x,u  1,  =  j,  :,UT)  -  J(x,U  1, -,Ut  =  N,  ..,UT)}.  (10) 

Consider  the  term  J2Ul..uT  nXui  ■■ttt,utj(x,  «i.  ut  =  j, uT). 

This  is  nothing  but  the  expected  T-step  information  gain  in 
following  stochastic  policy  II  whenever  ut  =  j.  Define 


•7(t,i)(X:n)  =  'Khui-^T,uTJ{X,Ul,..,Ut  =j,  ..,UT),  (11) 

U\..Ut 


where  subscript  (t,j)  denotes  the  gradient  of  J(x,n)  with 
respect  to  ntj.  Then,  using  the  above  definition  and  Eq.  10, 
it  follows  that 

dJg^U)  =  J(tj)(x,  n)  -  J(t,jv)(x>n),  (12) 

for  all  t  and  all  j.  Thus,  by  simulating  sample  I-space  trajecto¬ 
ries  under  the  stochastic  policy  II,  we  can  estimate  the  gradient 
of  the  T-step  information  gain  function  J (x,  II)  with  respect  to 
each  of  the  control  probabilities  irt  j.  Then,  the  policy  II  can 
be  improved  by  ascending  along  the  gradient  Note 

that  the  gradient  8,7g}jn-)  is  a  T  x  N  matrix  whose  (t,j) 
element  is  ■  Mathematically,  this  means  we  adjust  the 

stochastic  policy  at  iteration  n  (not  to  be  confused  with  time 
t )  as  follows: 

nn+1  =  VP{  Un  +  en^nn)|n=n„},  (13) 

where  en  is  a  small  step  size  and  Vp{.)  denotes  a  projection 
onto  the  space  of  stochastic  policies  P.  The  projection  is 
necessary  since  the  new  policy  update  need  not  satisfy  the 
constraints  required  to  be  satisfied  by  a  stochastic  policy. 
This  projection  results  in  a  quadratic  programming  problem 
whenever  the  constraints  are  violated.  However,  estimating 

exactly  is  intractable  owing  to  the  large  number  of 
simulations  required  to  do  the  estimation.  Instead,  we  can 
form  a  noisy  estimate  of  from  a  single  sample  path 

(simulation)  of  the  I-space  process  as  follows.  Recall  Eq.  12 
and  suppose  that  w  is  a  sample  realization  of  the  I-space 
process,  where  {xi(w),  «i(w),  •  •  •  Xt(w),  «t(w)}  denotes  the 
sample  I-space/  control  space  path,  with  associated  informa¬ 
tion  gain  ,/  (oj).  Then,  the  information  gradient  equation  12 
can  be  approximated  by  using  the  noisy  information  gradient 
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estimate: 


dJ(x, n) 

dntj 


J(w) 


Tf t,j 

-J{u) 

7 Tt,N 


if  ut(w)  =j, 


if  ut(w)  =  N, 


=  0,  o.w.. 


(14) 


Thus,  using  the  noisy  information  gradient  from  the  Eq.  14, 
the  policy  update  equation  may  be  written  as  follows: 

nra+i  =  VP{ Hn  +  endJ{^n)  |n=nj,  (15) 

where  dJg^n'1  is  a  T  x  TV  matrix  whose  (t,j)  element  is 

■  Using  the  noisy  policy  update  Eq.  15  above,  we 
improve  the  policy  by  ascending  the  gradient  and  in  the  limit, 
we  would  hope  to  reach  an  optimum  point  for  the  information 
gain  function  J(x,n).  A  convergence  analysis  for  the  above 
procedure  is  provided  in  Section  IV.  A  remark  regarding  the 
noisy  information  gradient  equation  14  is  in  order  here. 

Remark  1.  Note  the  division  by  nt.j  or  tt tjj\r  in  Eq.  14. 
If  TTt.j  =  0,  then  the  aforementioned  division  results  in  the 
indeterminate  form  0/0.  Physically,  if  ttt,j  =  0  for  some  t,j, 
then  the  control  Ut  =  j  will  never  be  taken  under  policy 
II.  Thus,  we  can  never  form  an  estimate  of  Jt,j(Xi  H),  the 
information  gain  whenever  Ut  =  j,  and  therefore  can  never 
form  an  estimate  of  the  gradient  of  the  information  gain  with 
respect  to  ntj.  Hence,  Eq.  14  is  only  used  when  7rtJ-  0. 
In  the  case  when  irtj  =  0,  we  can  circumvent  the  problem  as 
follows.  Before  any  I-space  sample  path  simulation,  we  choose 
one  of  the  time  steps  t  for  which  TTt.j  =  0  for  some  j  with 
some  positive  probability  5,  i.e.,  we  follow  nt  in  choosing  ut 
with  probability  (1  —  (5)  else  we  do  the  following:  we  randomly 
choose  one  of  the  j  such  that  =  0  and  apply  this  control 
action  at  time  t.  For  all  other  r  we  follow  ttt  in  choosing 
control  actions  uT.  In  essence,  using  the  above  procedure, 
we  sporadically  explore  actions  that  would  otherwise  never 
be  taken  under  II  with  a  small  positive  probability  while 
keeping  the  rest  of  the  policy  the  same.  This  allows  us  to 
form  an  unbiased  estimate  of  the  information  gradient  with 
respect  to  element  (t,j)  for  which  TTt.j  =  0-  This  exploratory 
procedure  is  analogous  to  the  exploration  steps  that  are  used 
during  the  policy  evaluation  step  in  Actor-Critic  algorithms  for 
Reinforcement  learning,  as  well  as  the  exploratory  off-policy 
steps  that  are  used  in  Q-learning  [14],  [15]  . 


time  1,  z\,  and  update  our  I-state  according  to  the  Bayes 
filteringing  equation  to  get  the  I-state  at  time  1,  Xi-  Assuming 
that  the  underlying  system  is  autonomous  (note  that  Eq.  1  is 
time  independent  and  hence,  autonomous),  then  we  can  reset 
time  to  0,  make  Xi  our  new  initial  I-state  \o  and  repeat  the 
procedure  outlined  above.  In  this  fashion,  at  every  time  step, 
given  the  current  I-state,  the  T-step  stochastic  optimization  can 
be  done  online  and  applied  in  a  receding  horizon  fashion. 
Mathematically,  the  RHC -based  feedback  control  for  I-state 
X  can  be  written  as: 

urhc(x)  =  A  argmax  J(x,n),  (16) 

where  e\  is  the  first  unit  vector  in  i  f  (e\  isolates  the  control 
at  the  first  time  instant  of  the  T-step  open  loop  control  policy). 
The  above  recursive  procedure  is  summarized  in  the  pseudo¬ 
code  IS-RHC. 


Algorithm  1  Algorithm  IS-RHC 

•  Given  initial  information  state  xo>  lookahead  horizon  T, 
initial  policy  Hi  and  error  tolerance  5 

1)  n  =  1,  define  |  |Hi  —  Ho  1 1  =  <5  +  1 

2)  WHILE  ||nn-nn_1||  >  6 

DO 

a)  Generate  sample  I-space  path  {Xt(w)}£=  t  start¬ 
ing  with  initial  I-state  xo- 

b)  Use  Eq.  14  to  form  the  noisy  estimate  of  the 
information  gradient  using  the  sample  path. 

c)  Use  Eq.  15  to  update  the  policy. 

3)  Output  converged  T-step  policy  n*  =  [7r*..7r;jn]  and 
choose  control  U\  according  to  7t*. 

4)  Observe  noisy  measurement  z\  and  update  using  the 
Bayes  filter  equation  to  obtain  the  new  I-state  xi- 

5)  Set  xo  =  Xi  and  g°  t0  Step  1. 

•  End 


Remark  2.  The  feedback  policy  that  results  from  the  IS- 
RHC  (ref.  Eq.  16)  is  different  from  that  which  would  result 
from  solving  the  DP  equation  (ref.  Eq.  4).  In  the  DP  case 
the  expectation  is  with  respect  to  the  sample  paths  that  are 
generated  as  a  result  of  a  feedback  policy  u(.)  while  in  the 
case  of  IS-RHC  the  expectation  is  with  respect  to  the  sample 
paths  generated  by  an  open  loop  (not  feedback)  sequence  of 
control  actions  {ui,  •  •  •  ut}- 


C.  IS-RHC  Algorithm 

Suppose  at  time  t  =  0  the  I-state  of  the  system  is  xo-  Also 
suppose  that  we  are  given  some  initial  guess  for  the  optimal 
T-step  stochastic  policy,  say  Ho-  Then  using  the  simulation 
based  noisy  gradient  estimate  from  Eq.  14  and  the  policy 
improvement  step  from  Eq.  15,  we  can  ascend  the  gradient 
of  the  function  J(x,  n)  and  find  an  optimum  w.r.t  n.  This 
gives  us  a  T-step  policy  =  {7rJ...7r^}.  As  in  the  standard 
Receding  Horizon  control  approach,  we  apply  the  control  Mi 
according  to  7 t*.  Next  we  observe  the  noisy  measurement  at 


D.  Convergence  Analysis 

In  the  following  we  drop  all  reference  to  the  initial  I-state  x 
in  the  optimization  problem  J(x,  n)  and  refer  to  the  function 
as  only  J(n).  The  gradient  of  the  function  J(.)  with  respect  to 
n  is  denoted  by  f?(n).  Let  {<7;(n)  <  0}  denote  the  constraints 
on  the  problem  for  some  i  =  1,  ...K.  The  constraints  are  all 
linear  and  are  of  the  form  0  <  TTt.j  <  1  f°r  aH  t  £  {1...T},  j  £ 
{1...7V},  and  J2j  Kt,j  <  1  for  all  t  £  {1..T}.  However,  note 
that  the  information  gain  function  is  multilinear  and  in  general, 
can  have  multiple  local  minima.  Let  the  compact  set  defined 
by  the  constraints  above,  the  space  of  stochastic  policies,  be 
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denoted  by  P.  Let  us  denote  the  set  of  stationary  points  of 
J(.)  by  S  where 

5  =  {n :  0(n)  -  ]T  Ai9i(n)  =  o,  a,  >  o},  (17) 

i 

where  Aj  =  0  whenever  q-ffA)  <  0  and  \  >  0  otherwise. 
Note  that  the  set  S  is  the  collection  of  all  the  Kuhn-Tucker 
(K-T)  points  of  the  function  J(II).  The  set  is  non-empty  since 
J(II)  is  continuous  and  the  set  P  of  stochastic  policies  is 
compact,  and  therefore  the  function  will  attain  its  extrema  in 

P.  Moreover,  the  set  Si  can  be  decomposed  into  disjoint,  con¬ 
nected  and  compact  subsets  S  such  that  J (II)  =  constant  =  Ci 
over  each  S,  (see  [16],  p.  126),  since  J(.)  and  g.j(.)  are  twice 
continuously  differentiable.  Let  the  step  size  parameters  satisfy 
the  following  conditions  which  is  a  standard  assumption  for 
most  stochastic  approximation  algorithms  [16]: 

=  00  ’  ^2en  <  °0- 
n  n 

Then  the  following  result  holds: 

Proposition  1.  The  sequence  of  policy  updates  IIr,  — t  Si  for 
some  unique  i. 

Proof:  Write  the  constrained  noisy  policy  update  equation 
15  as 

n„+i  =  n„  +  e„(6?(nn)  +  zn),  (18) 

zn  £  67(11,,),  where  67(11)  denotes  the  infinite  convex  cone 

generated  by  the  outward  normals  of  the  active  constraints  at 
point  II.  A  convex  cone  generated  by  vectors  iq  and  v-2  is  the 
set  {v  :  v  =  a\V\  +  }  where  0:1,2  >  0.  Note  that  zn  =  0 

if  1 1  „  is  in  the  interior  of  the  constraint  set  P.  Further,  the 
above  expression  may  be  rewritten  as  follows: 

Yn  =  g(nn)  =  s(iin)  +  g(nn)  -  E[g( n„)|nn] 

S  'N' - * 

+  E[g(nn)\un}  -g{un).  (19) 

N _ y 

In 

It  can  easily  be  seen  that  (?(.)  is  an  unbiased  estimator  of  C?(.) 
and  hence,  it  follows  that  yn  =  0  above.  Further,  due  to  the 
same  reason,  E[f)n\TYn]  =  0  and  thus,  {i/jn}  is  a  Martingale 
difference  sequence.  These,  along  with  the  step  size  condition 
above,  imply  that  all  the  assumptions  of  Theorem  2.1  (p.  127) 
of  [16]  are  satisfied  and  hence,  An  — »•  S,  for  some  unique  i. 

Q. E.D. 

IV.  Illustrative  Examples 

In  this  section,  we  shall  present  an  application  of  the 
IS-RHC  technique  to  an  illustrative  example  containing 
N  decoupled  2-dimensional  oscillators.  This  abstract  class 
of  problems  involving  an  array  of  simple  oscillators  was 
chosen  because  it  is  relatively  simple,  yet  resembles  the 
motivating  satellite  tracking  problem  in  that  the  dynamical 
subsystems  exhibit  periodic  behavior.  The  decoupled  nature 
of  the  oscillators  only  affects  the  filtering  algorithm  used  to 
generate  the  sample  I-space  paths,  otherwise,  the  method  is 


Normalized  Final  Information  Utility 


Greedy  No  Smear  Half  Smear  Full  Smear 


Fig.  1 .  Comparison  of  the  RHC  and  shortsighted  policies  for  a  sample  initial 
I-state  for  the  case  of  N  =  4  Van  Der  Pol  oscillators.  The  bars  represent  the 
mean  final  information  utility;  the  box  plots  show  the  median  (red  line),  first 
(upper  blue  line)  and  third  quartiles  (lower  blue  line),  and  range  of  the  results. 
There  are  3  versions  of  the  RHC  policy.  In  the  “no  smear”  approach,  we 
initialize  the  solution  of  any  N-step  horizon  with  the  solution  of  the  previous 
N  step  horizon,  over  the  first  N-l  steps  of  the  current  horizon  while  smearing 
the  last  step,  i.e,  setting  all  probabilities  at  that  step  equal.  In  the  half  smear 
approach,  the  solution  is  initialized  by  randomizing  a  part  of  solution  over 
the  previous  time  horizon,  while  in  the  full  smear  approach,  the  solution  of 
any  time  horizon  is  fully  randomized,  i.e.,  all  actions  at  every  time  are  set  to 
be  equally  likely  thereby  erasing  all  knowledge  from  the  optimization  over 
the  previous  time  horizon. 


independent  of  such  coupling. 

The  information  gain  metric  that  is  used  is  the  following: 

A I(x,u)  =  E[det{P~l)  -  detiP-1)}, 

where  det(A)  represents  the  determinant  of  the  matrix  A,  I\ 
is  the  covariance  of  the  I-state  x  and  is  the  covariance 
of  the  I-state  resulting  from  taking  control  u  at  I-state  \- 
The  experimental  results  shown  below  are  for  a  set  of  4  2- 
dimensional  Van  der  Pol  oscillators.  We  assume  that  an  EKF 
can  be  used  to  keep  track  of  the  I-states  and  hence,  the  I- 
state  can  be  approximated  as  a  Gaussian  distribution.  The 
state  space  of  the  I-space  MDP  problem  in  this  case  is  then 
4  x  (2  +  3)  =  20  dimensional  (2  dimensional  mean  and  3 
dimensional  covariance  times  the  number  of  oscillators).  We 
test  our  technique  against  a  short  sighted  policy  that  only 
looks  one  step  ahead  while  our  RHC  technique  looks  five 
steps  ahead.  The  results  of  our  experiments  are  summarized 
in  Figures  1  and  2.  In  Figure  1,  we  see  that  the  average 
gain  of  the  I-RHC  technique  over  that  of  a  shortsighted 
policy  is  approximately  500-600%.  The  information  gain  is 
calculated  over  a  horizon  of  20  times  steps  for  300  different 
initial  conditions.  In  Figure  2,  we  show  a  particular  tasking 
example  wherein  the  RHC  outperforms  the  shortsighted  policy 
by  a  very  large  margin  (approximately  25  times  better).  This 
simple  example  provides  empirical  evidence  that  the  IS-RHC 
technique  does  indeed  result  in  control  policies  that  maximize 
the  information  gained  about  the  system  over  the  long  run 
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Fig.  2.  Comparison  of  the  RHC  and  shortsighted  policies  for  a  sample  initial  I-state  for  the  case  of  N  =  4  oscillators.  The  shortsighted  oscillator  repeatedly 
measures  the  third  oscillator  which  has  high  process  noise  and  a  low  sensor  noise,  thereby  giving  the  highest  short  term  information  gain.  The  RHC  techniques, 
on  the  other  hand,  let  the  uncertainty  of  the  third  oscillator  grow,  while  measuring  other  oscillators,  knowing  that  the  third  oscillator  can  be  measured  later 
and  its  uncertainty  reduced.  This  long  sighted  strategy  results  in  a  much  higher  information  gain  for  the  RHC  techniques  when  compared  to  the  shortsighted 
strategy. 


when  compared  to  shortsighted  policies.  Further,  to  the  best 
of  our  knowledge,  even  for  the  simple  examples  considered  in 
this  paper,  the  dynamic  programming  problems  are  very  high 
dimensional  and  the  I-space  RHC  technique  seems  to  be  the 
only  one,  in  addition  to  Monte  Carlo  based  methods  [11],  [12], 
that  is  capable  of  handling  such  high  dimensional  problems. 
However,  the  comparison  with  these  MC  based  methods  is 
beyond  the  scope  of  this  paper  and  will  be  done  in  a  follow 
up  paper. 


V.  Conclusion 

In  this  paper,  we  have  proposed  a  receding  horizon  control 
based  approach  to  solve  I-space  MDP,  termed  IS-RHC,  in¬ 
stead  of  solving  the  associated  computationally  intractable  DP 
equation.  We  proposed  a  simulation  based  stochastic  gradient 
technique  for  solving  the  open  loop  stochastic  optimization 
problem  that  results  at  every  time  step  due  to  the  IS-RHC 
technique.  We  have  tested  the  IS-RHC  on  a  simple  example, 
which  nevertheless  result  in  large  DP  problems  that  are  well 
beyond  the  capability  of  existing  techniques,  and  the  results 
show  that  the  IS-RHC  technique  does  result  in  significant 
improvement  in  the  information  gained  regarding  the  system 
when  compared  to  a  shortsighted  policy.  Further  research  will 
focus  on  testing  the  IS-RHC  on  more  realistic  examples  as 
well  as  extending  the  formulation  such  that  constraints  on  the 
information  process  and  the  problem  of  decentralized  control 
for  multiple  sensors  can  be  taken  into  account. 


References 

[1]  T.  S.  Kelso  et  al.,  “Improved  conjunction  analysis  via  collaborative 
space  situational  awareness,”  in  Proc.  AAS/  AIAA  Spaceflight  Mechanics 
Meeting,  2008. 

[2]  F.  Bourgault  et  al.,  “Information  based  adaptive  robot  exploration,”  in 
Proc.  of  2002  IEEE/  RSJ  Int.  conf.  on  Intell.  rob.  sys.,  2002. 

[3]  S.  Aranda  et  al.,  “On  optimal  sensor  placement  and  motion  coordination 
for  target  tracking,”  in  Proc.  IEEE  Int.  conf.  Dec.  Cont.,  2005. 

[4]  D.  Q.  Mayne  et  al.,  “Constrained  model  predictive  control:  Stability  and 
optimality,”  Automatica,  vol.  36,  pp.  789-814,  2000. 

[5]  D.  R  Bertsekas,  Dynamic  Programming  and  Optimal  Control,  vols  I  and 
II.  Cambridge:  Athena  Scientific,  2000. 

[6]  P.  R.  Kumar  and  P.  P.  Varaiya,  Stochastic  Systems:  Estimation,  Identifi¬ 
cation  and  Adaptive  Control.  Prentic  Hall,  NJ:  Prentice  Hall,  1986. 

[7]  L.  P.  Kaelbling  and  M.  L.  Littman,  “Planning  and  acting  in  partially 
observable  stochastic  domains,”  Artificial  Intelligence,  vol.  101,  pp.  99- 
134,  1998. 

[8]  N.  Roy,  G.  Gordon,  and  S.  Thrun,  “Finding  approximate  pomdp  so¬ 
lutions  through  belief  compression,”  Journal  of  artificial  intelligence 
research,  vol.  23,  pp.  1-40,  2005. 

[9]  M.  Spaan  and  N.  Vlassis,  “Perseus:  Randomized  point-based  vallue 
iteration  for  pomdps ,”  Journal  of  Artificial  Intelligence  Research,  vol.  24, 
pp.  195-220,  2005. 

[10]  J.  Pineau  et  al.,  “Anytime  point  based  approximations  for  large  pomdps,” 
Journal  of  Artificial  Intelligence  Research,  vol.  27,  pp.  335-380,  2006. 

[11]  M.  Kearns  et  al.,  “A  sparse  sampling  algorithm  for  near-optimal  planning 
in  large  markov  decision  processes,”  in  Proceedings  of  the  IJCA1,  1999. 

[12]  L.  Kocsis  and  C.  Szepesvari,  “Bandit  based  monte  carlo  planning,”  in 
In:  ECML-06.  Number  4212  in  LNCS,  2006. 

[13]  A.  O.  Hero  III,  D.  A.  Castanon,  D.  Cochran,  and  K.  Kastella,  Eds., 
Foundations  and  Applications  of  Sensor  Management.  Springer  US, 
2009. 

[14]  D.  P.  Bertsekas  and  J.  N.  Tsitsiklis,  Neuro-Dynamic  Programming. 
Boston,  MA:  Athena  Scientific,  1996. 

[15]  R.  S.  Sutton  and  A.  G.  Barto,  Reinforcement  Learning:  An  Introduction. 
Cambridge,  MA:  MIT  Press,  1998. 

[16]  H.  J.  Kushner  and  G.  G.  Yin,  Stochastic  Aproximations  and  Recursive 
Algorithms  and  Applications,  2nd  Edition.  New  York,  NY:  Springer- 
Verlag,  2003. 


Information  Space  Receding  Horizon  Control  for  Multi-Agent  Systems 

Z.  Sunberg,  S.  Chakravorty,  R.  Erwin 


Abstract — In  this  paper,  we  present  a  receding  horizon 
solution  to  the  problem  of  optimal  scheduling  for  multiple 
sensors  monitoring  a  group  of  dynamical  targets.  The  term 
’target’  is  used  here  in  the  classic  sense  of  being  the  object  that 
is  being  sensed  or  observed  by  the  sensors.  This  problem  is 
motivated  by  the  Space  Situational  Awareness  (SSA)  problem. 
The  multi-sensor  optimal  scheduling  problem  can  be  posed 
as  a  multi-agent  Partially  Observed  Markov  Decision  Process 
(POMDP)  whose  solution  is  given  by  an  Information  Space 
(I-space)  Dynamic  Programming  (DP)  problem.  We  present  a 
simulation  based  stochastic  optimization  technique  that  exploits 
the  structure  inherent  in  the  problem  to  obtain  variance 
reduction  along  with  a  distributed  solution.  This  stochastic 
optimization  technique  is  combined  with  a  receding  horizon 
approach  which  obviates  the  need  to  solve  the  computationally 
intractable  multi-agent  I-space  DP  problem  and  hence,  makes 
the  technique  computationally  tractable  for  such  problems.  The 
technique  is  tested  on  a  simple  numerical  example  which  is 
nonetheless  computationally  intractable  for  existing  solution 
techniques. 

I.  Introduction 

In  this  paper,  we  consider  the  problem  of  optimal  schedul¬ 
ing  for  multiple  sensors  such  that  the  information  gained 
by  the  sensors  is  maximized.  The  class  of  problems  that 
is  considered  is  motivated  by  the  so-called  Space  Situa¬ 
tional  Awareness  (SSA)  problem.  It  is  easily  shown  that  the 
scheduling  problem,  in  general,  may  be  posed  as  a  Partially 
Observed  Markov  Decision  Problem  (POMDP)  whose  so¬ 
lution  is  given  by  an  information  space  (I-space)  Dynamic 
Programming  (DP)  problem.  In  the  case  of  multiple  agents, 
the  resulting  problem  is  a  multiple  agent  I-space  DP  problem 
that  is  impossible  to  solve  computationally  owing  to  the  ex¬ 
ponential  complexity  of  the  problem  in  terms  of  the  number 
of  agents  and  the  resulting  exponential  explosion  of  the  state 
and  control  spaces.  We  propose  a  generalization  of  an  I-space 
receding  horizon  control  (I-space  RHC:  IS-RHC)  approach 
that  we  had  proposed  to  the  single  sensor  problem  in  previous 
work  [1],  to  the  case  of  multiple  agents.  The  solution  strategy 
is  termed  the  I-space  RHC  multi-agent  technique  (I-RHC- 
M).  The  online  stochastic  optimization  problems  that  result 
from  the  receding  horizon  approach  are  solved  using  a 
simulation  based  gradient  ascent  technique.  The  underlying 
structure  of  the  problem  allows  us  to  drastically  reduce 
the  variance  of  the  gradient  estimates  while  allowing  for  a 
distributed  implementation  of  the  gradient  ascent  technique. 
The  technique  is  tested  on  a  simple  example  to  show  the 
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efficacy  of  the  method.  In  recent  years,  the  optimal  sensing 
problem  has  garnered  a  lot  of  interest  in  the  Control  and 
Robotics  community  and  is  variously  known  as  Information- 
theoretic  Control/  Active  Sensing  and  Dual  Control  [2]-[12], 
Discrete  dynamic  scenarios  such  as  target  tracking  [6]-[9], 
and  linear  spatially  distributed  systems  [13],  [14]  have  been 
considered,  but  relatively  very  little  has  been  done  on  the 
optimal  sensing  of  nonlinear  dynamical  phenomenon.  In  the 
linear  dynamical  scenario,  the  optimal  scheduling  problem 
results  in  a  deterministic  optimal  control  problem  which 
can  be  solved  using  Model  Predictive  control  (see  below). 
In  the  nonlinear  case,  the  problem  is  stochastic  and  thus, 
is  significantly  harder  to  solve  since  we  have  to  solve  the 
associated  stochastic  DP  problem.  In  the  past  decade,  there 
has  also  been  a  significant  volume  of  research  on  the  prob¬ 
lems  of  co-operative  sensing,  estimation  and  control  [15], 

[16] .  These  techniques  have  considered  various  classes  of 
multi-agent  systems  and  have  proposed  distributed  estimation 
and  control  schemes  for  such  problems  including  formation 
keeping,  flocking  and  distributed  sensing.  The  multi  sensor 
scheduling  problem  we  consider  in  this  paper  also  falls  under 
the  category  of  multi-agent  systems.  However,  the  structure 
of  the  problems  that  we  consider,  motivated  by  the  SSA 
problem,  is  unlike  any  other  in  the  aforementioned  literature. 
In  particular,  the  problem  we  consider  has  a  time  varying 
graph  structure  that  introduces  further  complexity  into  the 
problem  and  none  of  the  above  techniques  are  applicable.  In 
this  paper,  we  suggest  a  receding  horizon  control  approach 
to  the  solution  of  such  stochastic  multi-agent  sequential 
decision  making  problems,  in  particular,  I-space  sequential 
decision  making  problems  for  multiple  agents,  that  allows 
us  to  account  for  all  the  complexities  introduced  by  the 
class  of  problems  representing  the  SSA  problem.  Further, 
the  underlying  structure  of  the  problem  is  exploited  to  obtain 
variance  reduction  of  the  gradient  estimation  that  is  required 
by  the  technique  as  well  as  a  distributed  implementation  of 
the  technique. 

It  is  very  well  known  that  stochastic  control  problems  with 
sensing  uncertainty,  of  which  sensor  scheduling  problems  are 
a  special  case,  can  be  posed  as  a  Markov  Decision  Problem 
(MDP)  on  the  I-state,  which  is  usually  the  conditional  filtered 
probability  density  function  (pdf)  of  the  state  of  the  system 

[17]  — [19].  Unfortunately,  it  is  also  equally  well  known  that 
such  problems  are  notoriously  difficult  to  solve  owing  to  the 
twin  curses  of  dimensionality  and  history,  so  much  so  that 
such  problems  have  only  been  solved  for  small  to  moderate 
sized  discrete  state  space  problems  (i.e.,  wherein  the  under¬ 
lying  state  space  of  the  problem  is  discrete).  Initially,  exact 
solution  of  the  POMDPs  were  sought  [19],  [20]  utilizing  the 


convexity  of  the  cost-to-go  function  in  terms  of  the  I-state. 
However,  these  techniques  do  not  scale  well.  Thus,  focus 
shifted  to  solving  such  I-space  problems  using  randomized 
point  based  value  iteration  in  which  a  set  of  random  I-states 
are  sampled  in  the  I-space  and  an  approximate  MDP  defined 
on  these  randomly  sampled  states  is  then  exactly  solved  using 
standard  DP  techniques  such  as  value/  policy  iteration  [21]- 
[23],  These  methods  have  resulted  in  the  solution  of  much 
higher  dimensional  problems  when  compared  to  the  ones 
that  can  be  solved  using  exact  techniques,  however,  these 
methods  still  do  not  scale  to  continuous  state,  observation 
and  control  space  problems.  The  problem  we  consider  in 
this  paper  is  a  multi-agent  POMDP  and  the  state  and  control 
space  of  the  problem  explodes  exponentially  in  terms  of  the 
number  of  agents.  Thus,  these  problems  are  exponentially 
harder  to  solve  computationally  when  compared  to  single 
agent  I-space  problems.  There  has  been  considerable  interest 
of  late  in  solving  multi-agent  MDP  problems  that  are  tailored 
to  exploit  the  structure  that  is  inherent  in  such  problems 
with  Value/  Policy  Iteration  as  well  as  reinforcement  learning 
based  techniques  [24]— [28] .  However,  the  class  of  problems 
that  we  consider  in  this  paper  not  only  do  not  conform  to 
the  structure  required  by  these  techniques,  but  also  have 
continuous  state  and  observation  spaces  to  which  these 
methods  do  not  scale.  The  I-RHC-M  technique  sequentially 
solves  open  loop  optimization  problems  given  the  current  In¬ 
state  of  the  system  which  precludes  having  to  explore  the 
huge  state  space  of  multi-agent  MDPs  and  thereby,  keeps 
the  method  computationally  tractable. 

Model  Predictive  or  Receding  Horizon  Control  (MPC/ 
RHC)  has  been  applied  very  successfully  in  industry  [29], 
[30],  In  the  deterministic  setting,  the  MPC  technique  and 
the  Dynamic  Programming  technique  essentially  give  the 
same  answer.  The  MPC  techniques  solve  a  sequence  of  finite 
horizon  open  loop  control  problems  in  a  receding  horizon 
fashion  instead  of  solving  the  infinite  dimensional  DP  equa¬ 
tion  offline.  In  this  fashion,  constraints  on  the  systems  can 
be  taken  into  account,  which  is  very  difficult  in  DP.  This 
has  led  to  many  successful  applications  [29],  [30].  Recently, 
there  has  been  increasing  interest  in  stochastic  receding 
horizon  control  (SRHC)  approaches  [3 1  ]— [33]  that  provide 
receding  horizon  approaches  to  constrained  stochastic  control 
problems.  However,  many  of  these  techniques  have  been 
developed  for  linear  systems  with  analytical  models  of  the 
dynamics  and  constraints.  In  our  case,  an  analytical  model  of 
the  process  does  not  exist,  instead  we  have  access  to  simu¬ 
lations.  We  propose  an  SRHC  approach  to  solve  multi-agent 
I-space  sequential  decision  making  problems.  A  sequence  of 
open  loop  stochastic  optimization  problems  are  solved  online 
using  a  distributed  simulation-based  optimization  technique 
in  a  receding-horizon  fashion.  It  should  be  noted  that  in 
the  stochastic  case,  the  RHC  and  DP  techniques  do  not 
coincide  because  in  the  DP  formulation,  the  optimization  is 
over  feedback  policies  and  not  open  loop  control  sequences 
as  in  the  I-RHC-M  technique.  However,  such  DP  prob¬ 
lems,  in  particular,  I-space  problems,  especially  multi-agent 
problems,  are  computationally  intractable  in  continuous  state 


and  observation  spaces,  and  thus,  the  I-RHC-M  technique 
provides  a  computationally  attractive  solution.  The  empirical 
results  show  that  the  I-RHC-M  technique  does  lead  to  better 
payoffs  in  terms  of  information  gains  when  compared  to  a 
shortsighted  strategy. 

The  rest  of  the  paper  is  organized  as  follows:  In  Section  II, 
we  formulate  the  class  of  multi  sensor  scheduling  problems 
of  interest,  primarily  motivated  by  the  SSA  problem.  In  Sec¬ 
tion  III,  we  present  the  I-RHC-M  technique  for  the  solution 
of  this  class  of  problems.  In  Section  IV,  we  present  a  simple 
numerical  example  involving  multiple  sensors  measuring  a 
group  of  nonlinear  simple  pendulums,  which  is  nonetheless 
intractable  for  other  existing  techniques  in  the  literature,  as 
a  proof  of  concept,  of  the  I-RHC-M  technique. 

II.  Model  and  Problem  Formulation 

In  this  section,  we  model  the  class  of  multiple  sensor 
scheduling  problems  that  we  are  interested  in  solving  in 
this  work.  This  class  of  problems  is  motivated  by  the  Space 
Situational  Awareness  problem  (SSA)  but  can  be  extended 
in  a  straightforward  fashion  to  a  broader  class  of  problems. 

We  are  interested  in  tracking  a  set  of  N  targets  where  the 
state  of  the  ith  target  is  governed  by  the  stochastic  ODE: 

Xi  =  fi{xi)  +  giWi,  (1) 

where  il\  is  a  white  process  noise  term  perturbing  the 
motion  of  target  i.  The  term  ’target’  is  used  here  in  the  classic 
sense  of  being  the  object  that  is  being  sensed  or  observed 
by  the  sensors. 

We  assume  that  there  are  M  sensors  S  =  {5j  },  typically 
M  «  N,  and  suppose  that  every  sensor  j  can  make  a 
measurement  of  one  among  a  set  of  targets  at  any  given 
point  in  time  denoted  by  the  set  TJ(i),  where 

T°(t)  =  {k  £  [1, ..,  iV]|target  k  is  visible  to  sensor  j}. 

We  make  the  following  assumption  to  simplify  the  pre¬ 
sentation  of  our  technique,  however,  it  can  be  relaxed  in  a 
relatively  straightforward  fashion. 

A  1.  Any  target  ”‘i”,  at  any  time  ”t”,  is  in  the  field  of  view 
(FOV)  of  only  one  sensor. 

Further,  let  us  denote  by  S(i,t ),  the  unique  sensor  that 
can  see  target  i  at  time  t,  i.e,  S  :  TxFl  -A  S ,  is  an  integer 
valued  function  that  maps  the  product  space  of  the  target  set 
T  and  the  time  horizon  FI  =  [0, ...  H]  into  a  unique  positive 
integer  denoting  a  particular  sensor  in  the  set  of  sensors  S. 
We  make  the  following  assumption. 

A  2.  The  function  S(i,t)  is  known  a  priori  for  a  given  time 
horizon  Fl¬ 
its  obvious  that  the  following  relationship  holds  between 

Ti(t)  and 

Tj(t)  =  {ieT\S(i,t)=j}.  (2) 

Thus,  knowing  S(i,  t)  allows  us  to  find  T:ft)  and  vice-versa. 

The  above  assumption  allows  us  to  simplify  the  problem 
somewhat  by  assuring  us  that  the  set  of  control  choices 


available  to  the  different  sensors  is  deterministic,  albeit  time 
varying.  We  note  that  the  S(i,t)  function  can  be  thought  of 
as  a  ’’most  likely”  a  priori  estimate  of  the  sensors’  control 
choices,  and  discrepancies  due  to  the  stochasticity  of  the 
system  can  be  accounted  for  in  the  planning  phase.  For 
instance,  if  there  is  a  target  in  view  of  a  sensor  that  is 
not  predicted  by  S(i,t)  then  the  sensor  will  never  look  at 
that  target,  and  if  a  target  that  was  predicted  to  be  there  is 
not,  then  the  reward  for  making  a  measurement  of  the  non¬ 
existent  target  would  be  negative  as  the  uncertainty  would 
increase,  and  hence,  the  control  policy  would  learn  to  avoid 
such  a  choice. 

Suppose  now  that  a  sensor  j  can  make  a  measurement  of 
precisely  one  of  the  targets  in  its  FOV  at  time  t,  i.e., 

Ui  =  Hj{xi)  +  vj, where*  G  TJ(t),  (3) 

and  Vj  is  a  white  measurement  noise  process  corrupting  the 
measurements  of  sensor  j. 

Given  the  measurements  of  a  target  i  till  time  t,  we 
assume  that  some  suitable  Bayes  filter  is  used  to  estimate 
its  conditional  pdf.  Let  us  denote  its  pdf  /  Information  state 
(I-state)  by  Xi(t).  Let  denote  the  control  action  of 

sensor  S(i,t)  at  time  t,  i.e.,  the  target  that  sensor  S(i,t) 
chooses  to  measure  from  among  the  targets  in  its  FOV  at 
time  t. 

Let  the  incremental  reward/  utility/  information  gain  of 
taking  control  uf1'1'1'1  for  target  i,  at  time  t,  be  denoted 
by  A Then,  the  total  reward  of  using  a 
sequence  of  control  policies  over  a  time  horizon  TL  for  target 
i,  {wf(M)(.)}"o,  is  8iven  bY: 

Vixuiu^  (■)}?=  o) 

H 

=  E[J2AI(Xi{t),uf(l’t\x(.t)))/Xi{0)  =  Xi\ ■  (4) 

t=o 

In  the  above  expression,  the  expectation  is  over  al  in¬ 
formation  trajectories  that  result  from  the  feedback  policies 
uf  (’’*)(.).  In  general,  the  feedback  control  function  for  any 
sensor  S(i,t )  that  sees  target  i  at  time  t,  is  a  function  of  the 
composite  I-state  of  all  the  targets  x  =  {Xi>'--Xiv},  not 
just  Xi-  We  assume  that  the  total  reward  for  the  system  is 
the  sum  of  the  rewards  of  the  individual  targets. 

The  problem  can  then  be  posed  as  one  of  maximizing 
the  total  reward  of  the  system  over  all  feasible  feedback 
policies  of  the  individual  sensors.  Exploring  the  entire  state 
and  control  spaces  is  essentially  impossible  in  this  case 
owing  to  the  huge  dimensionality  of  the  problem.  Further,  in 
this  case,  the  DP  solution  is  necessarily  time  varying  which 
complicates  the  solution  of  the  DP  problem  further. 

III.  Multi-agent  Information  Space  Receding 
Horizon  Control(I-RHC-M) 

In  previous  work,  we  have  proposed  an  I-space  receding 
horizon  control  approach  that  involves  solving  an  open  loop 
stochastic  optimization  problem  at  every  time  step,  for  the 
case  of  scheduling  the  measurements  of  a  single  sensor.  In 
this  section,  we  shall  extend  this  approach  to  the  problem  of 


multiple  sensors  in  the  scenario  formulated  in  the  previous 
section. 


A.  The  Open  Loop  optimization  Problem 

First,  we  shall  look  at  the  open  loop  optimization  problem, 
i.e.,  an  optimization  problem  where  the  finite  horizon  reward 
function  J(x-  U)  is  a  function  of  a  sequence  of  a  given  initial 
I-state  x  and  a  sequence  of  open  loop  control  actions  U,  as 
opposed  to  the  feedback  control  policies  considered  in  the  DP 
formulation  in  the  previous  section  (note  that  we  distinguish 
the  open  and  closed  loop  reward  functions  using  J(.)  and 
V(.)  respectively).  In  particular,  we  would  like  to  solve  the 
open  loop  stochastic  optimization  problem: 

N 

max^  J(xi,{«f(M)}),  (5) 

WI  7=1 


where  the  optimization  is  over  all  possible  control  choices 
of  every  sensor-time  2-tuple  (j.  t).  In  the  above  notation 
rtf' denotes  the  control  choices  of  sensor  S(i,  t)  at  time  t. 
We  use  this  notation  because  the  total  reward  of  the  system 
can  be  defined  in  terms  of  the  individual  rewards  of  the 
different  targets  and  it  further  allows  us  to  extract  structure 
from  the  problem.  Since  sensor  S(i,t)  may  be  seeing  other 
targets  j  G  Ts(l,t\t),  we  note  that  S(i,t)  =  S(J,t )  for  all 
j  G  Ts’I*,t)( t).  Thus,  the  choices  G  i.e., 

the  sensor  S(i,  t)  can  choose  to  measure  any  of  the  targets  in 
at  time  t.  Hence,  the  open  loop  optimization  is  to 
maximize  the  reward  of  the  system  given  the  control  choices 
available  to  every  sensor-time  2-tuple  (j,  /j.and  a  given  initial 
I-state  x  over  the  finite  time  horizon  'H.  Note  that  this  is  an 
open  loop  optimization  problem  and  does  not  consider  the 
control  to  be  a  function  of  the  particular  information  states 
that  are  encountered  along  an  information  trajectory. 

Next,  we  consider  a  randomization  of  the  control  choices 
available  to  any  given  sensor:  instead  of  the  control  u\  being 
deterministic,  i.e,  the  sensor  chooses  to  measure  exactly  one 
of  the  targets  in  its  FOV  at  time  t,  we  assume  that  the 
sensor  chooses  to  measure  one  of  the  targets  in  its  FOV 
with  a  certain  probability.  Let  us  denote  the  probabilities 
representing  the  randomized  policies  for  every  sensor  time 
tuple  (j,  t)  by  {it3tk}  where: 

7 Tjtk  =  Prob.{u{  =  k),  (6) 

i.e.,  the  probability  that  the  jth  sensor  at  time  t  chooses 
to  measure  the  kth  target  in  its  FOV.  Compactly,  we  shall 
denote  the  randomized  policy  for  the  sensor  time  2-tuple 
(j.  t)  by  1  Ij .  Also,  we  shall  denote  the  randomized  policies 
of  all  the  sensor-time  tuples  by  ft  =  { I  Ij }.  Given  the 
definitions  above,  the  total  reward  for  target  i  in  following 
the  composite  randomized  sensor  policy  ft  =  { i  I) }  is  given 
by  the  following: 


j(xi,n)  =  j(Xi,{n2'})=  £ 


S(i,  1) 


J(Xi,U  1 


S(*,l) 


S(i,h)  \  S(i,l) 

'UH 


TS(i,H) 

H,u 


S(i,H)  ■ 


(7) 


The  average  above  is  over  all  possible  choices  of  u^l't>  for 
all  possible  t,  £  PL.  Further,  the  total  reward  in  following  the 
randomized  policy  {11^ }  is  then  given  by: 

N 

j(x,n)  =  ^j(Xi,n).  (8) 

i- 1 


B.  Simulation  based  Information  Gradient  Technique 


In  the  following,  we  shall  use  gradient  ascent  to  find  a 
maximum  for  the  total  reward  of  the  system.  In  order  to  do 
this,  we  first  need  to  evaluate  the  gradient  for  every 
sensor-time  2-tuple  (j,  t).  In  particular,  we  can  show  that  the 
gradient  d'J  is  given  by  the  following: 

oir;  , 


dj 

d<k 


E  E 


1 


S(l,  1) 


1  ,U 


S(i,  1)” 


tS(1,H) 
H,u 


S(l,H )  ■ 


(9) 


To  see  why,  note  that  11)  explicitly  appears  only  in  the 
reward  expressions  of  the  targets  that  are  in  the  FOV  of 
sensor  j  at  time  t,  namely  TJ(f).  Hence,  the  gradient  only 
involves  contributions  from  these  targets.  Further,  note  that 
for  any  l  £  T^[t)  ,  by  definition  S(l,t)  =  j.  Hence,  the 
above  expression  implies  that  the  gradient  of  the  total  reward 
with  respect  to  the  probability  that  the  sensor-time  pair  ( j ,  t) 
measures  the  kth  target  in  its  field  of  view  is  given  by  the 
average  cost  of  the  information-trajectories  of  the  targets 
in  given  that  sensor  j  at  time  t  actually  chooses  to 

measure  the  kth  object  in  its  FOV. 

The  gradient  ascent  algorithm  is  the  following: 


n i 


vP 


dJ  ' 

an I 


(10) 


where  7  is  a  small  step  size  parameter,  and  Vp{]  denotes 
the  projection  of  a  vector  onto  the  space  of  probability 
vectors  P  (needed  because  the  policy  updated  policy  of 
sensor-time  pair  (j,  t)  is  not  gauranteed  to  fall  in  P). 

Of  course,  implementing  the  deterministic  gradient  ascent 
algorithm  above  entails  averaging  over  multiple  realizations 
of  the  information  trajectories  and  sensor  control  sequences. 
Instead,  we  use  a  stochastic  gradient  ascent  technique  utiliz¬ 
ing  only  one  sample  realization  of  the  information  trajectory. 
We  replace  in  Eq.  10  with  where 

allt  oI\.t 

=  j(J,t))(uj)ifu|  =  k 

Kk 

=  0,  o.w.  (11) 


where  lo  represents  a  sample  realization  of  the  information 
process,  and  ./W>t)  represents  the  information  gain  of 
the  targets  in  TJ(f)  for  that  particular  realization  of  the 
information  process. 


Remark  1.  Distributed  Implementation:  Suppose  that  we 
have  a  CPU  for  every  sensor-time  tuple  ( j ,  t).  This  processor 


evaluates  the  gradient  JFF  ,  and,  because  depends  only 

on  the  I-trajectories  of  the  targets  in  TJ  (t),  the  CPU  need 
only  simulate  trajectories  of  those  targets.  Thus  the  CPU 
need  only  know  a  subset  of  the  policies  for  the  network  of 
sensors  -  those  for  sensors  that  can  see  the  targets  in  TJ  (t), 
that  is  nf  (l,T\  l  £  T-i  (f)  and  r  £  PL.  This  allows  for  a  sparse 
connection  graph  among  the  processors  thereby  facilitating  a 
distributed  implementation  of  the  gradient  ascent  algorithm. 

C.  Convergence 

The  stochastic  information  gradient  algorithm  is  guaran¬ 
teed  to  converge  to  one  of  the  set  of  Kuhn-Tucker  points  of 
the  function  J(x,  {H^ })  with  the  constraints  being  that  the 
randomized  policy  for  every  sensor-time  pair  (j,  t),  I  If  needs 
to  be  a  probability  vector. 

In  the  following  we  drop  all  reference  to  the  initial  I-state 
X  in  the  optimization  problem  for  J(x,  fl)  and  refer  to  the 
function  as  only  J(F1).  The  gradient  of  the  function  J(.) 
with  respect  to  fl  is  denoted  by  (/(FI).  Let  {/^(Fl)  <  0} 
denote  the  inequality  constraints  on  the  problem  for  some 
i  =  1,  ...K  and  h,  ( fl )  =  0  denote  the  equality  constraints 
for  some  i  =  1,  •  •  •  L.  Let  the  compact  set  defined  by  the 
constraints  above,  the  space  of  stochastic  policies,  be  denoted 
by  P.  Let  us  denote  the  set  of  stationary  points  of  J(.)  by 
S  where 

s  =  {n:S(n) 

-  £  AiVgi(n)  -  ^  pjVhiifV)  =0,  Xi>  0},  (12) 

*  j 

where  A,;  =  0  whenever  gi(n)  <  0  and  A;  >  0  otherwise. 
Let  the  step  size  parameters  satisfy  the  following  conditions: 

E  C»  =  00  >  E  en  <  00  * 

n  n 

Then  the  following  result  holds: 

Proposition  1.  The  sequence  of  policy  updates  Hn  —>  Si, 
for  some  unique  i  almost  surely,  i.e.,  the  stochastic  gradient 
algorithm  (Eq.  10,  11)  converges  to  a  set  of  stationary  (K-T) 
points  such  that  the  value  of  J(.)  on  each  such  set  Si  is 
constant. 

D.  Receding  Horizon  Control 

We  have  presented  a  simulation-based  stochastic  gradient 
technique  to  get  a  maximum  of  the  total  reward  J(x,  Fl)  with 
respect  to  the  sensor-time  randomized  policies  Fl  =  { llj  } 
given  some  initial  information  state  In  the  following,  we 
recursively  solve  such  open  loop  optimization  problems  at 
every  time  step  given  the  current  information  state  to  obtain 
a  receding  horizon  solution  to  the  sensor  scheduling  problem 
for  multiple  sensors. 

Suppose  at  the  initial  time  the  information  state  of  the 
system  is  xo-  Then,  given  this  initial  information  state,  we 
use  the  stochastic  information  gradient  technique  presented 
previously  to  obtain  a  maximum  for  the  total  information 
reward  of  the  system  over  the  randomized  policies  of  every 
sensor-time  pair  ( j ,  t)  over  some  given  horizon  H.  Then,  we 


implement  the  first  time  step  of  the  policies  for  every  sensor 
j,  and  take  measurements  of  the  targets  as  specified  by  the 
control  policies  at  the  first  time  step.  Then,  we  use  suitable 
filtering  techniques  to  update  the  information  state  of  the 
system  to  obtain  a  new  information  state  yf .  Then,  we  set 
Xo  =  x' ■  tLiid  repeat  the  information  gradient  technique  to 
obtain  a  minimum  of  the  total  reward  over  the  next  horizon 
TL  given  the  new  information  state  yf.  The  technique  can  be 
summarized  in  the  I-RHC-M  algorithm  below. 


Algorithm  1  Algorithm  I-RHC-M 

•  Given  initial  information  state  y.o  ar,d  lookahead  hori¬ 
zon  TL 

1)  Use  the  stochastic  information  gradient  technique 
(Eq.  10,  11)  to  obtain  a  minimum  of  the  total 
reward  J(xo,{II(})  over  all  sensor-time  pairs 
(j,  t)  over  the  horizon  TL. 

2)  Output  converged  T-step  policy  Itf*  for  every 
sensor-time  pair  (j,  t) 

3)  Observe  noisy  measurement  z  based  on  the  first 
step  of  policy  {II)!"}  and  update  information  state 
using  a  suitable  filter  to  obtain  the  new  I-state  %t- 

4)  Set  xo  =  Xi  anc*  g°  to  Step  1. 

•  End 


IV.  Illustrative  Example 

In  this  section,  we  shall  apply  the  I-RHC-M  technique  de¬ 
veloped  in  the  previous  section  to  a  simple  problem  involving 
multiple  simple  pendulums  that  mimics  the  SSA  problem. 
Although  the  problem  is  relatively  simple,  nevertheless  it 
is  so  high  dimensional  that  no  existing  technqiue  in  the 
literature  can  be  used  to  solve  it.  We  consider  a  set  of  N 
simple  (nonlinear)  pendulums  subject  to  white  process  noise. 
We  have  access  to  M  noisy  angle  sensors  with  disjoint  FOVs. 

Note  that  the  above  simple  problem  has  the  flavor  of  the 
SSA  problem,  in  that  each  sensor  has  a  bounded  FOV,  and 
can  measure  a  target  if  and  only  if  its  within  the  FOV. 
Further,  the  pendulum  problem  is  periodic  like  the  SSA 
problem.For  the  numerical  examples  below,  we  apply  our 
I-RHC-M  technique  to  a  situation  where  there  are  4  targets 
(TV  =  4)  and  3  sensors  to  measure  them  (M  =  3).  The 
initial  states  of  the  pendulums  are  chosen  randomly  and 
we  assume  that  the  noise  statistics  are  identical  for  each 
of  the  pendulums  an  sensors.  We  assume  that  the  Gaussian 
assumption  holds  in  this  problem  and  use  extended  Kalman 
filters  to  approximate  the  filtered  densities,  or  I-states  of 
the  pendulums.  The  information  gain  metric  used  in  this 
work  is  the  difference  in  the  determinants  of  the  information 
(inverse  of  the  covariance)  matrix  of  the  targets  and  the 
total  information  gain  is  the  sum  of  the  information  gains 
of  the  different  targets.  The  state  space  of  each  pendulum 
is  2  dimensional.  Given  that  the  Gaussian  approximation 
holds,  the  I-state  of  every  pendulum  can  be  specified  by  its 
mean  and  covariance  and  hence,  the  I-state  of  each  pendulum 
is  6  dimensional.  Thus,  given  4  pendulums,  the  joint  state 
space  of  the  problem  is  24  dimensional.  None  of  the  existing 
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Fig.  1.  Performance  of  the  I-RHC-M  algorithm  in  the  average  case  scenario 


Test  Number 

Fig.  2.  Performance  of  the  I-RHC-M  algorithm  in  the  best  case  scenario 

techniques  can  solve  such  a  high  dimensional  problem,  given 
even  the  extensive  computing  resources  available  today  (the 
highest  dimensional  DP  problem  that  can  be  solved  is  usually 
6  to  8  dimensional).  Thus,  even  this  simple  example,  shows 
the  degree  of  computational  complexity  that  is  inherent  in 
the  problem  and  to  the  best  of  our  knowledge,  the  I-RHC-M 
procedure  is  the  only  one  that  is  capable  of  tackling  such 
problems. 

The  results  of  our  numerical  simulations  are  shown  in 
Figs.  1  and  2.  For  comparison,  we  chose  a  “greedy”  myopic 
policy  as  it  is  the  only  technique,  other  than  the  I-RHC- 
M  technique,  that  scales  to  high  dimensional  problems  such 
as  this.  The  I-RHC-M  technique  had  a  lookahead  horizon 
of  10  timesteps  and  the  information  gains  were  evaluated 
over  a  total  time  of  20  timesteps.  In  Fig.  1,  we  show  the 
average  gain/  loss  of  the  I-RHC-M  method,  averaged  over 
three  runs  of  the  I-RHC-M  technique,  compared  to  that  of 
the  greedy  policy,  for  twenty  different  initial  conditions,  i.e., 
we  run  the  I-RHC-M  technique  three  different  times  for 


each  initial  condition  and  compare  the  average  information 
gain  over  these  runs  to  the  information  gain  of  the  greedy 
policy.  Note  that  the  1-RHC-M  policies  will,  in  general,  be 
different  for  different  runs  due  to  the  stochasticity  of  the 
algorithm.  In  Fig. 2,  we  compare  the  information  gain  of 
the  best  of  the  three  I-RHC-M  runs  to  the  information  gain 
of  the  greedy  policy.  Note  that  there  is  no  guarantee  that 
the  I-RHC-M  policy  can  beat  the  greedy  policy,  at  least 
theoretically.  However,  as  can  be  seen  from  the  plots,  the 
I-RHC-M  technique  does  beat  the  greedy  policy  most  of  the 
time.  The  I-RHC-M  technique  provided  an  improvement  of 
approximately  13%  over  the  greedy  policy  in  the  averaged 
case,  and  an  improvement  of  20%  in  the  best  of  three  case. 

It  should  be  noted  that,  if  we  use  a  reward  function 
that  is  the  product  of  the  individual  reward  functions,  the 
reward  gains  over  the  greedy  policy  are  much  larger  (on 
the  order  of  10  times  better),  and  this  structure  is  more 
consistent  with  the  description  of  the  reward  function  as  the 
amount  of  information  gained  about  the  system  of  targets.  In 
order  to  maintain  the  parallelization-friendly  characteristics 
of  the  reward  function  from  Remark  1,  the  logarithm  of 
this  function  must  be  used  as  the  reward  function.  We 
have  not  yet  obtained  results  using  this  reward  function, 
but  our  conjecture  is  that  we  will  be  able  realize  very  large 
information  gains  with  it. 

V.  Conclusion 

In  this  paper,  we  have  introduced  an  information  space 
receding  horizon  control  technique  for  multi-agent  systems, 
termed  the  I-RHC-M  technique,  with  application  to  the 
SSA  problem.  The  method  is  based  on  a  simulation  based 
stochastic  gradient  technique  that  is  used  to  solve  a  fi¬ 
nite  horizon  stochastic  optimization  problem  recursively  at 
every  time  step,  thereby  providing  a  feedback  solution  to 
the  problem.  We  have  shown  that  the  method  is  highly 
parallelizable  and  naturally  inherits  a  variance  reduction 
property  owing  to  its  structure.  We  have  also  shown  that 
the  method  is  capable  of  handling  very  high  dimensional 
continuous  state  and  observation  space  problems  for  multi¬ 
agent  systems  that  no  other  existing  technique  can  claim  to 
solve.  We  have  tested  our  technique  on  a  simple  example, 
which  is  nonetheless  computationally  intractable  for  other 
existing  solution  techniques,  and  have  shown  that  the  method 
achieves  significant  improvement  over  a  greedy  policy  (the 
only  other  computationally  viable  strategy). 
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